Upload
pilsung-choe
View
212
Download
0
Embed Size (px)
Citation preview
Computers & Industrial Engineering 57 (2009) 37–45
Contents lists available at ScienceDirect
Computers & Industrial Engineering
journal homepage: www.elsevier .com/ locate/caie
A query-based cross-language diagnosis tool for distributed decisionmaking support
Pilsung Choe a, Mark R. Lehto a,*, Hyung Jun Park b, Jan Allebach b
a School of Industrial Engineering, Purdue University, 259 Grissom Hall, 315 N. Grant Street, West Lafayette, IN 47907, USAb School Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA
a r t i c l e i n f o
Article history:Available online 3 December 2008
Keywords:CLIRCross-language information retrievalCLDCross-language diagnosisPrint quality troubleshootingWebsite localization
0360-8352/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.cie.2008.11.020
* Corresponding author. Tel.: +1 765 494 5428; faxE-mail addresses: [email protected] (P. Choe), leh
[email protected] (H.J. Park), [email protected]
a b s t r a c t
A query translation-based Korean–English cross-language diagnosis (Q-KE-CLD) tool for assisting Koreanusers diagnosing print defects was developed and then evaluated as a case study of distributed decisionmaking support for nonnative English users. The first step in developing the Q-KE-CLD tool involved col-lecting and analyzing print defect descriptions in Korean and English. A fuzzy Bayesian model wasobtained from the descriptions and the Q-KE-CLD tool was developed. The tool was then experimentallyevaluated in four different universities in South Korea. Results showed that Korean subjects generatedKorean queries faster (p = 0.008) when entering Korean queries. In addition, the subjects rated Koreanqueries as being easier to generate (p = 0.004). Untrained subjects reported that use of the Korean lan-guage made it easier to generate queries and identify print defects. The overall results suggested thatquery translation-based cross-language diagnosis is a feasible approach for localizing troubleshootingwebsites.
� 2008 Elsevier Ltd. All rights reserved.
1. Introduction
Complex organizations in general have a decision making struc-ture in which distributed decision makers share information(Barber, Han, & Liu, 2000; Boland et al., 1992; Jennings, 1996).Computer technologies have played an important role in integrat-ing distributed sources of information and improving access(Kraemer & King, 1988; Pinson, Loucã, & Poraitis 1997). Internettechnologies have especially accelerated development of distrib-uted decision making systems in which customers perform taskssuch as troubleshooting jointly with decision makers.
In recent years, many companies have developed websites thattheir customers can use to diagnose product problems (Choe, Kim,Lehto, Lehto, & Allebach, 2006; Foo, Hui, Leong, & Liu, 2000; Kim,Choe, Lehto, & Allebach, 2005). A good troubleshooting websitecan help minimize the need to maintain expensive call centers,and can improve the customer experience. However, most web-sites include text that users have to read, understand, and followto navigate the websites. This can obviously cause problems fornonnative users, such as when a Korean user tries to navigate awebsite containing written English. The traditional solution hasbeen to develop localized websites which present information tousers in their native language (Aykin, 2005; Brandon, 2001). The
ll rights reserved.
: +1 765 494 [email protected] (M.R. Lehto),
(J. Allebach).
most expensive and time consuming method of localization is totranslate the entire website. An alternative approach proposed inthis study focuses on allowing the users to enter queries in thewebsite in their native language to diagnose their problems. Oneadvantage of this approach is that query diagnosis quickly leadsto target information minimizing the process of website navigation(Olston & Chi, 2003). In addition, queries are much easier to trans-late into other languages than entire websites, which means thatquery translation-based cross-language information retrieval(Q-CLIR) might be a more practical approach for localization thantranslating the website itself.
CLIR is defined as a method for finding relevant documents inone language (or target language) using a query expressed in an-other language (Chen, 2003; Diekema, 2003; Lee, Kageura, & Choi,2004; Mitkov, 2003; Petrelli, Hansen, Beaulieu, & Sanderson, 2002;Salton, 1970, 1973). Previous applications of Q-CLIR have focusedfor the most part on allowing users to retrieve information fromlarge databases. It seems reasonable that this approach could beeasily applied to help users diagnose or troubleshoot problemswith a product in a website resulting in a method we call querytranslation-based cross-language diagnosis (Q-CLD). In such anapproach, a user would diagnose product problems by entering aquery describing the problem. The system would then translatethe query and return potential diagnoses.
In this paper, we present a case study where a Q-CLD tool forquery translation-based Korean–English cross-language diagnosis(Q-KE-CLD) for print quality troubleshooting was implemented
38 P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45
and evaluated. The main objective of the research was to evaluatethe Q-KE-CLD tool to see if a query translation-based cross-lan-guage diagnostic approach can reduce the time needed by usersto diagnose problems, as well as increase user’s satisfaction. Inaddition to being of theoretical interest, this research wasprompted by the pressing need of Hewlett Packard and other com-panies for localized print quality troubleshooting websites. Overthe past 5 years, print quality troubleshooting websites for severallines of laser printers have been developed by Purdue Universityand Hewlett Packard (ex. http://www.hp.com/cpso-support-new/pq/5500/home.html). The websites allow users to diagnose printdefects associated with several different models of Hewlett Pack-ard’s laser printers by link-based browsing (Choe et al., 2006;Kim, Choe, Lehto, & Allebach, 2007; Kim et al., 2005; Park et al.,2006). This existing website provided an excellent test bed forevaluating the potential benefit of the Q-CLD tool. This study there-fore provided a practical approach to localize the print qualitytroubleshooting websites based on a Q-CLD.
2. Conceptual model and hypotheses
2.1. Conceptual model
In this study, a query translation-based Korean–English cross-language diagnosis (Q-KE-CLD) tool for print defect troubleshoot-ing was evaluated as a case study for a Q-CLD. Fig. 1 presents aconceptual model of how Korean users troubleshoot problems byquery diagnoses, which was revised from the four-phase frame-work of information retrieval (Ma, 2002; Shneiderman, Byrd, &Croft, 1997). The resulting model shown in Fig. 1 assumes thatKorean users first formulate a query in either English or Koreanthat describes the problem they are trying to diagnose. The key-words chosen in the query formation state should reflect user
II. Action
I. Formation
Objective
Domainknowledg
Koreanquery English query
IV. Refinement
Korean-English translation - Automatic translation - Keywords matching
III. Review
Enter a English query
Enter a Korean query
Strategy
H1 H2
H3
NativeEnglish fuzzyBayesian model
NonnatEnglis
descrip
Fig. 1. Conceptual mode
objectives, domain knowledge and tool knowledge (Ma, 2002).For this particular application, domain knowledge is likely to bean important factor in deciding queries. This follows since (Leman2001, Leman & Lehto, 2003) showed that different user groupstended to use different keywords and terminology to describe printdefects.
In the next step of a query-based diagnosis, referred to as theaction stage, the users enter either an English or a Korean queryinto the system. If the users happen to enter Korean queries inthe Q-CLD tool, they are translated into English queries. The trans-lated queries then become inputs to the native English fuzzyBayesian model. If users enter English queries, the native Englishfuzzy Bayesian model is used without translation. In the third step,referred to as the review stage, the website shows the result basedon the native English fuzzy Bayesian inference. In this stage, usersreview the results in descending order of probability and decidewhat action they will take. In the fourth and final stage, referredto as the refinement stage, the users decide whether to enter addi-tional queries or stop at this stage.
2.2. Hypotheses
It was assumed from the conceptual model that a Q-CLD toolwould be helpful for nonnative English users to diagnose print de-fects. Three hypotheses were proposed to test the conceptual mod-el described previously. To test these hypotheses, a fuzzy Bayesianmodel was built for developing the Q-CLD tool.
The first and second hypotheses are related to user performanceand preferences using the Q-CLD tool as described below. The firsthypothesis was that nonnative English users will more quicklydiagnose print defects when using the Q-CLD tool than when usingthe English query diagnosis tool. The second hypothesis states thatnonnative English users will prefer the Q-CLD tool over the English
User’sinformation processing
Memory & attention
Textprocessing & language
comprehension
Problem solving & decision making
e
iveh
tion
l of query diagnosis.
Table 1Types of print defects used in the data collection sessions.
Orientation Patterna Description Defectb and colorc
Horizontal P1 Horizontal lines repeating every1.4 in.
D1 (C), D2 (M), D3 (Y),D4 (K)
Horizontal P2 Horizontal lines repeating every3.7 in.
D5 (C), D6 (M), D7 (Y),D8 (K)
Horizontal P3 Broad fuzzy horizontal bandsrepeating every 1.4 in.
D9 (C), D10 (M), D11
(Y), D12 (K)Vertical P4 White vertical lines or bands D13 (C), D14 (M), D15
(Y), D16 (K)Vertical P5 Vertical lines or line segments D17 (C), D18 (M), D19
(Y), D20 (K)Vertical P6 Fine vertical lines D21 (C), D22 (M), D23
(Y), D24 (K)None P7 Missing color D25 (C), D26 (M), D27
(Y), D28 (K)None P8 Tinted background D29 (C), D30 (M), D31
(Y), D32 (K)
a Pi: Pattern i (i = 1, 2,. . ., 8).b Dj: Defect j (j = 1, 2,. . ., 32).c C, Cyan; M, Magenta; Y, Yellow; K, Black.
P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45 39
query diagnosis tool for print quality diagnosis. These hypothesesfollow because information processing requires text processing,language comprehension, memory, attention, problem solvingand decision making from users (Fig. 1). Subjects for the experi-ment in this study were native Koreans who speak Korean as theirnative language and use English as a second language. Therefore,their English was assumed to be insufficient to express their per-ception of print defects as freely as they would using Korean. Thismeans that nonnative English users would need more informationprocessing time and experience a higher mental workload whenusing English queries.
The third hypothesis pertains to user performance and prefer-ences using the Q-CLD as a function of domain knowledge. It washypothesized that nonnative English subject groups having differ-ent levels of domain knowledge will not significantly differ interms of user performance and preferences when using theQ-CLD tool. Leman (2001) analyzed accuracies of the fuzzy Bayes-ian model for nine subject groups differing in domain knowledge.The latter study showed that there was no significant differencein prediction accuracy among the subject groups. NonnativeEnglish users’ performance and preferences using the Q-CLD toolin this study were also expected to be independent of domainknowledge.
3. Q-CLD tool evaluation
3.1. Fuzzy Bayesian model for query diagnosis
A fuzzy Bayesian model for query diagnosis was developed. Thefirst step in model development involved collecting descriptiondata on how people perceive 32 typical print defects of a HewlettPackard’s laser printer (Table 1). One data set containing descrip-tions expressed in Korean was obtained using 40 Korean partici-pants at five different universities in South Korea over a period ofthree weeks from November 21 to December 8, 2006 (Choe, Lehto,& Allebach, 2007). The other data set contained descriptions inEnglish obtained using 40 native English participants at PurdueUniversity in the United States over a period of three weeks fromFebruary 11 to March 2, 2007. The procedure for both groups ofparticipants was the same except for the language used to describethe print defects (Choe, Lehto et al., 2007). From the data collectionprocedure, 1280 Korean descriptions and 1280 English descrip-tions were obtained. Following data collection, the data was parsedand morphologically analyzed (Choe, Lehto et al., 2007) for bothsets of responses using the TextMiner computer program (Lehto,2004). From each of the English and Korean 1280 descriptions,960 descriptions were used to train a fuzzy Bayesian model and320 descriptions were used to evaluate the fuzzy Bayesian model.However, the Korean fuzzy Bayesian model (Choe, 2007) was notconsidered in this paper, since the focus was on query translation.
A keywords matching approach developed in this research wasused for Korean to English (K–E) translation of queries. In this ap-proach, the most frequently used English keyword was chosen for atranslation of a Korean keyword. For example, in Korean,has two possibly different basic meanings, , orin this case. can be translated into ‘‘go” which was a key-word contained 23 times in the English keywords. , canbe translated into any of keywords, ‘‘thin”, ‘‘fine”, ‘‘close”, ‘‘small”,or ‘‘faint” contained in the English keywords. The keyword, ‘‘thin”,was chosen for translation of because it was containedmost frequently, 148 times, in the English keywords.
Likelihood ratios (LR) for keywords were then calculated by tak-ing the ratio of the probability of a keyword wj given a print defectcategory Di divided by the probability of the keyword wj in thedescription: LR(Di|wj) = P(wj|Di)/P(wj). These LR values for key-
words provided a basic method to diagnose a print defect given adescription.
The model was analyzed using 320 descriptions for each lan-guage to validate the fuzzy Bayesian model. Analysis resultsshowed that the fuzzy Bayesian model correctly predicted 45% ofthe print defects with its top prediction, and in 65%, 75%, 83%,and 87% of the cases the actual defect was one of the top two,three, four or five predictions when using 320 English queries.Using 320 Korean queries which were translated into English que-ries, this model correctly predicted 37% of the print defects with itstop prediction, and in 57%, 65%, 75%, and 80% of the cases the ac-tual defect was one of the top two, three, four or five predictions.
3.2. The tested system
A Q-CLD tool which also functioned as an English diagnosis tool toidentify print defects was developed. Prior to development of the Q-CLD tool, a pilot study (Choe, Kim, Lehto, & Allebach, 2007) was con-ducted to decide upon an appropriate interface for the tool. Based onthe results of the pilot study, a static display, which showed threethumbnail of example defects in return for a query was chosen.The keywords matching translation approach was used for the Q-CLD tool to determine which examples to display. Only single wordswere considered to predict the print defects to simplify the Q-CLDtool. Word combinations such as word pairs, word triplets, and wordquadruplets were not used for print defect predictions.
As shown in Fig. 2, the troubleshooting process using the querydiagnosis tool consisted of three main layers. The first layer wasdesigned to have two steps because the result of the fuzzy Bayesianmodel validation showed that prediction accuracy was highly im-proved in predicting the main features of print defects such as col-or, orientation, and pattern. Considering that Korean subjects hadtrouble describing print colors in English, the first step was de-signed to identify the color of the cartridge related to the print de-fect that he or she might be experiencing. The subjects did not haveto describe the color feature of the defect. Rather they could justselect one of the colors provided on the web site. Then they couldfocus on describing the other attributes of the defect as it wasperceived.
As mentioned before, the queries could be given in either Eng-lish or Korean. After choosing the color and entering a query, theuser could then move to the second layer by clicking on the searchbutton. The second layer showed defect thumbnails in response toa query in the first layer. If there was no thumbnail matching the
Fig. 2. Diagnostic process using the query diagnosis tool.
40 P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45
defect the user was experiencing, the user could move to anotherpage to see other defect thumbnails by clicking the ‘‘Prev” or‘‘Next” button. The second layer showed three thumbnails (Choe,Kim et al., 2007) in descending probability order based on the fuz-zy Bayesian model. The third layer showed a magnified image of
the defect, description, cause of the defect, troubleshooting andsuggestions. Regardless of the language subjects entered, eitherKorean or English, the text information on the website was writtenin English. It had a start and an end button to check the beginningand completion of a trial in the experiment.
Table 2Experimental subject profile.
Factor Content
Affiliation Four universities in KoreaOccupation Undergraduate or graduate studentsGender 25 males and 7 females (32 native Korean subjects)Age From 20 to 30 (Average: 25)Note Non-colorblind, TOEIC score P550
Table 3ANOVA table for the experiment.
Source ofvariation
Sum ofsquares
dfformula
df E (MS) F0
Between subjectsUser level (U) SSu p � 1 1 r2
� þ 2r2s þ 2r2
u MSu/MSs(g)
Subject withingroups
SSs(g) p(n � 1) 30 r2� þ 2r2
s
Within subjectQuery language (Q) SSq p � 1 1 r2
� þ 2r2qs þ 32r2
q MSq/MSqs(g)
QU SSqu (p � 1)(p � 1) 1 r2� þ 2r2
qs þ 32r2q MSqu/MSqs(g)
Q x subject withingroups
SSqs(g) p(n � 1)(p � 1) 30 r2� þ 2r2
qs
P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45 41
3.3. Experiment protocol
In the experiment, 32 trained and untrained subjects usedKorean and English queries in the Q-CLD tool to identify printdefects on four different images. The experiment compared userperformance and user preferences between Korean and Englishqueries used in the Q-CLD tool.
3.3.1. SubjectsAs shown in Table 2, 32 native Korean subjects participated in
the experiment, 25 of whom were male and the other seven werefemale, ranging from 20 to 30 years old. The subjects were under-graduate and graduate students at four universities in Korea (Hany-ang University at Seoul, Hanyang University at Ansan, POSTECH,Kumoh National Institute of Technology). All subjects were nativeKorean speakers, and therefore they were more comfortable withthe Korean language than English. However, all of them had TOEIC(Test of English for International Communication) scores greaterthan or equal to 550 (Average: 730), which was intended to qualifysubjects who were assumed to be proficient enough to read andunderstand information on the print quality troubleshooting web-
Fig. 3. Experiment
site written in English. Due to the nature of the experiment, whichinvolved comparing printouts in color, none of subjects were color-blind, as verified using an Ishihara colorblindness test.
3.3.2. Experimental designA 2 � 2 mixed design was used in this experiment (Winer,
Brown, & Michels, 1991). The independent variables were, respec-tively, the query language that subjects used to describe printdefects and user level. Query language was at one of two levelscorresponding to: (1) Korean or (2) English. In one set of theexperimental conditions, the subjects entered English queries. Inthe second, the Q-CLD tool automatically translated Korean que-ries using the keywords matching translation approach. The userlevel of subjects participating in the experiment was at one oftwo levels: (1) trained or (2) untrained. The trained subjects rep-resenting expert users were trained for 30 min using the websitewithout the query diagnosis tool before the experiment to attainfamiliarity with the print defects and terms. The remaining sub-jects represented novice users and received no training on thewebsite prior to the experiment. The mathematical model of thisdesign is:
Yijkl ¼ lþ Ui þ SkðiÞ þ Qj þ QUij þ QSjkðiÞ þ ejklðiÞ;
where Y is the dependent variable, S is subject, Q is query language,and U is user level (i = 1, 2; j = 1, 2; k = 1, 2, 3,. . ., 16; l = 1, 2). In this de-sign, query language was a within-subject factor and user level was abetween-subject factor. Each subject performed the tasks twice usingeach interface, resulting in pooled error terms. The ANOVA tableshowing the effects and tests for significance for the experiment isshown in Table 3. A required sample size of 32 subjects was calculatedusing an operating characteristic (OC) curve with a Type I error(a = 0.05) and a type II error (b = 0.1) (Montgomery, 2001).
The dependent variables considered in this experiment in-cluded measures of user performance and subjective ratings. Timeto generate queries, time to identify a print defect, and the numberof pages viewed were dependent variables used to measure userperformance. The subjective ratings were obtained using a ques-tionnaire on ease of generating queries and identifying printdefects, as discussed later in Section 3.3.4.
3.3.3. Experimental apparatusA laptop computer (Dell XPS140, 14-in.) was used for the exper-
iment and a commercial program, Spector (Professional edition,SpectorSoft), was used to record user performance data such as en-tered queries, spent time, and pages visited on the website.
al procedure.
Fig. 4. Sample images used for the experiment.
1 The values of the 7-point scale were, respectively: (1) strongly disagree, (2)disagree, (3) slightly disagree, (4) neutral, (5) slightly agree, (6) agree, and (7) stronglyagree.
42 P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45
3.3.4. ProcedurePrior to beginning the experiment, an Ishihara colorblindness
test was conducted to screen out colorblind subjects because printdefects were highly related to colors. After the test, the subjects weregiven a consent form as an agreement for their participation. Thenthey were required to fill out a pre-questionnaire including demo-graphic data for further analyses. After filling out the forms, the sub-jects were given instruction, which differed depending on subjectgroup. Subjects in the untrained group were given basic instructionsdescribing the objective and procedures followed in the experiment.In addition to the basic instructions that subjects in the untrainedgroup were given, subjects in the trained group had 30 min moretime to review the website by themselves to familiarize themselveswith the terms and print defects used in the website.
Fig. 3 provides an overview of the experimental procedure. Atthe beginning of the experiment, an image having no defects wasdisplayed on the laptop monitor and then a printout of the imageon the display having a defect was provided to the subject. Aftercomparing the images on the display and printout, the subjectwas asked to describe the defect of the printout in English or Kor-ean in the query diagnosis tool provided in the laptop computer ashe or she perceived it and then identified a defect using the querydiagnosis tool in the website.
As shown in Fig. 4, four different images and defects were usedin the experiment. The four defects – cyan broad fuzzy horizontalbands repeating every 1.5 in., magenta missing color, yellow whitevertical lines or bands, and black broad fuzzy horizontal bandsrepeating every 9.4 in. – were chosen to be broadly representativeof typical print defects of color laser printers that feature one offour colors (cyan, magenta, yellow, and black) and patterns suchas orientation, distance, and thickness of lines or bands.
Each subject repeated the procedure four times with differentdefects. The defect and the query language in each trial were coun-terbalanced in order to minimize learning effects. During theexperiment, subjects were asked to enter queries in as much detailas possible so that the query diagnosis tool could return a properresult. Subjects were allowed to change queries without limitation,
whenever they wanted, for any reason. After subjects were reason-ably certain that they had identified the defect, they then clickedon the end button. The visit time at each page and the names ofpages viewed by each subject were recorded in the commercialprogram, Spector to measure user performance. Because the exper-iment did not require the subjects to spend time using a printer,there was a 1-min wait embedded into the procedure when incor-rectly identifying a print defect. However, the 1-min wait wasexcluded from the analysis.
After completing each trial, subjects were asked to rate thequery diagnosis tool used in the trial on four 7-point subjective rat-ing scales.1 The subjective rating scales measured the extent towhich subjects agreed or disagreed with the following four state-ments: (1) Using the query diagnosis tool, it was difficult to gener-ate an English (or Korean) query that described this defectadequately; (2) it was easy to find a defect based on the English(or Korean) query I entered; (3) an English (or Korean) query easilyoccurred to my mind; and (4) the English (or Korean) query Igenerated was very helpful to diagnose a defect easily.
Pairs of equivalent questions were used to check the internal con-sistency of the questionnaire using the Cronbach’s alpha coefficient,such as (1) and (3) on how easy it was to make queries, and (2) and(4) on how easy it was to diagnose the defect. After the experiment,subjects were asked to compare and give any comments on generalpreference, satisfaction, or ease of use for further analyses.
3.4. Results
Prior to the statistical analysis, the data was checked for modeladequacy in terms of the assumptions of normality and homogeneousvariance. The assumption of normality was tested with the Shap-iro–Wilk’s test and Kolmogorov–Smirnov’s test and the assump-tion of homogeneous variance was tested with a residual plot.
Table 4User performance measures and preference ratings.*
Dependent variable Statistics User level Query language Total avg.
Trained Untrained Korean English
Time-to-query (s) Mean 44.3 61.1 44.8 60.6 52.7SD 27.2 35.3 22.3 38.8 32.4
Time-to-identify (s) Mean 43.5 67.8 43.7 67.5 55.6SD 47.6 52.6 26.8 65.7 51.2
Total identifying time (s) Mean 87.8 128.9 88.5 128.1 108.3SD 67.6 67.5 41.9 86.1 70.1
Number-of-pages-viewed (pages) Mean 4.4 5.4 4.7 5.1 4.9SD 1.6 2.2 2.0 2.0 2.0
Easy-to-query (7-pt scale*) Mean 5.4 4.8 5.5 4.6 5.1SD 1.0 1.4 1.0 1.3 1.2
Easy-to-identify (7-pt scale*) Mean 5.5 5.0 5.3 5.1 5.2SD 1.0 1.1 1.2 1.0 1.1
* The values of the 7-pt scale were, respectively: (1) extremely difficult (2) difficult (3) slightly difficult (4) neutral (5) slightly easy (6) easy (7) extremely easy.
Table 5ANOVA by dependent variable.
Dependentvariable
Source DF SS MS Fvalue
Pr > F
Time-to-query Between subjectsUser level (U) 1 18056.6 18056.6 3.20 0.084*
Subject withingroups
30 169347.5 5644.9
Within subjectQuery language (Q) 1 15970.6 15970.6 8.00 0.008**
Q � U 1 1181.6 1181.6 0.59 0.448Q � subject withingroups
30 59873.2 1995.8
Time-to-identify Between subjectsUser level (U) 1 37733.1 37733.1 4.34 0.046**
Subject withingroups
30 261118.5 8704.0
Within subjectQuery language (Q) 1 36290.2 36290.3 4.17 0.051*
Q � U 1 20.3 20.3 0 0.962Q � subject withingroups
30 261118.5 8704.0
Number-of-pages-viewed
Between subjectsUser level (U) 1 70.1 70.1 5.63 0.024**
Subject withingroups
30 373.7 12.5
Within subjectQuery language (Q) 1 15.0 15.0 0.87 0.360Q � U 1 4.5 4.5 0.26 0.610Q � subject withingroups
30 517.0 17.2
Easy-to-query Between subjectsUser level (U) 1 5.79 5.79 6.63 0.015**
Subject withingroups
30 26.2 0.9
Within subjectQuery language (Q) 1 12.9 12.9 9.69 0.004**
Q � U 1 9.6 9.6 7.18 0.011**
Q � subject withingroups
30 40.0 1.3
Easy-to-identify Between subjectsUser level (U) 1 4.6 4.6 3.88 0.058*
Subject withingroups
30 36.0 1.2
Within subjectQuery language (Q) 1 0.8 0.8 0.78 0.386Q � U 1 3.2 3.2 2.99 0.094*
Q � subject withingroups
30 31.8 1.06
* Marginally significant (p value < 0.1).** Significant (p value < 0.05).
P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45 43
The test statistics for the Shapiro–Wilk’s test, the Kolmogorov–Smirnov’s test, and the residual plots showed that the assumptionsof normality and homogeneity were not violated at a = 0.05.
Table 4 shows the mean values of the dependent variables ofthe experiment. This result shows that user performance andpreference varied by experience and query language. Overall per-formance time (ex. total average = 108 s) was less than half of that(271 s) observed in an earlier study of a print defect diagnosisusing the HP website, but without the query diagnosis feature(Choe et al., 2006; Kim et al., 2005, 2007; Park et al., 2006).
As expected, trained subjects performed better than untrainedsubjects, and Korean queries resulted in better performance mea-sures than English queries (Table 4). Although information in thewebsite was written in English, the mean time to identify (44 s)print defects for a Korean query was much shorter than that foran English query (68 s). At the same time, the Korean query (5.3)and the English query (5.1) show similar mean values in terms ofthe subjective ratings for ease of identifying, which means bothquery diagnosis tools were helpful in diagnosing the print defectsregardless of the query language. The overall Cronbach’s alphacoefficient was 0.81 for the subjective ratings, and therefore thesubject responses were considered internally consistent.
ANOVA was conducted to check if there were significant differ-ences between the independent variables for each dependent var-iable. As shown in Table 5, the query language significantlyaffected the speed of generating queries (p = 0.008), with Koreanqueries generated faster than English queries. The user level, asexpected, significantly affected the time required to identify thedefects (p = 0.046), and the number of pages viewed (p = 0.024).User level (p = 0.015), query language (p = 0.004), and their interac-tion (p = 0.004) were found to significantly affect the ease of query,with untrained subjects rating Korean queries easier than Englishqueries (Fig. 5).
In order to further investigate the difference between trainedsubjects and untrained subjects when using the same language, aStudent–Newman–Keuls Test was conducted at a = 0.05. As shownin Table 6, there were no significant differences between trainedsubjects and untrained subjects except for the speed of identifyingqueries when using Korean descriptions. Apart from user perfor-mance, ratings for ease of generating queries and ease of identify-ing queries were significantly different between trained subjectsand untrained subjects when using the English query diagnosistool as shown in Table 6. These results suggest the conclusion thatthe Q-CLD tool makes it easier for untrained subjects to identifydefects when using Korean queries. However it is not conclusivethat use of the Q-CLD tool improves performance of defect diagno-sis for untrained subjects.
Fig. 5. Significant interactions in ease of generating queries.
Table 7Primary results by hypotheses.
Hypothesis p value Result
Hypothesis 1: H0: lK = lE, H1: lK – lE
For ‘‘time-to-query” 0.008 SignificantFor ‘‘time-to-identify” 0.051 Marginally significantFor ‘‘number-of-pages-viewed” 0.360 Not significant
Hypothesis 2: H0: lK = lE, H1: lK – lE
For ‘‘easy-to-query” 0.004 SignificantFor ‘‘easy-to-identify” 0.094 Marginally significant
Hypothesis 3: H0: ltrained(K) = luntrained(K), H1: ltrained(K) – luntrained(K)
For ‘‘time-to-query” 0.448 Not significantFor ‘‘time-to-identify” 0.962 Not significantFor ‘‘number-of-pages-viewed” 0.610 Not significantFor ‘‘easy-to-query” 0.011 SignificantFor ‘‘easy-to-identify” 0.094 Marginally significant
44 P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45
4. Discussion
Table 7 summarizes the primary results of this study in relationto each hypothesis. As shown in Table 7, the results provided par-tial support for all three hypotheses. The first two hypotheses weresupported by the finding that Korean users more quickly generatedqueries for print defects when using the Q-CLD tool than whenusing the English query diagnosis tool, and rated the Q-CLD tooleasier than the English query diagnosis tool. The ‘‘time-to-identify”(p = 0.051) and ratings of ‘‘easy-to-identify” (p = 0.094) were alsoconsistent with the latter hypothesis. However, the ‘‘Number-of-pages-viewed” remained the same for both query languages. Over-all, these findings suggested the Q-CLD tool influences the timeneeded to generate queries by making it easier to generate queriesand may have little effect on query quality.
Further support for the latter conclusion was provided by thefinding that 26 out of the 32 subjects reported after the experimentthat Korean queries were generally easier to generate and that theymade it easier to identify print defects because they sometimeshad trouble making queries in English. Most of the subjects alsomentioned they used short and limited words when making
Table 6Student–Newman–Keuls test for user level by descriptions.
Description Dependent variable SNKgrouping
Mean N User level
Koreandescription
Time-to-query (seconds) A 38.6 16 Trained– Critical range: 15.7, MSE: 470.9 A 51.1 16 UntrainedTime-to-identify (seconds) A 31.8 16 Trained– Critical range: 17.5, MSE: 590.9 B 55.6 16 UntrainedNumber-of-pages-viewed(pages)
A 4.0 16 Trained
– Critical range: 1.4, MSE: 3.6 A 5.3 16 UntrainedEasy-to-query (7-pt scales) A 5.4 16 Trained– Critical range: 0.7, MSE: 1.0 A 5.5 16 UntrainedEasy-to-identify (7-pt scales) A 5.4 16 Trained– Critical range: 0.9, MSE: 1.4 A 5.3 16 Untrained
NonnativeEnglishdescription
Time-to-query (seconds) A 50.1 16 Trained– Critical range: 27.4, MSE: 1439.3 A 71.2 16 UntrainedTime-to-identify (seconds) A 55.1 16 Trained– Critical range: 47.3, MSE: 4299.0 A 79.9 16 UntrainedNumber-of-pages-viewed(pages)
A 4.7 16 Trained
– Critical range: 1.4, MSE: 3.8 A 5.5 16 UntrainedEasy-to-query (7-pt scales) A 5.3 16 Trained– Critical range: 0.8, MSE: 1.2 B 3.9 16 UntrainedEasy-to-identify (7-pt scales) A 5.6 16 Trained– Critical range: 0.7, MSE: 0.9 B 4.6 16 Untrained
English queries. Meanwhile, some subjects insisted English querieswere easier to generate because English is more commonly used todescribe printers.
Trained and untrained subjects significantly differed in theirratings on ‘‘easy-to-query” (p = 0.011) and marginally differed intheir ratings of ‘‘easy-to-identify” (p = 0.011) when they using theQ-CLD tool. The Q-CLD tool was rated as being easier to generatequeries by the untrained subjects than by the trained subjects.However, there were no significant differences in user performancemeasures such as ‘‘time-to-query”, ‘‘time-to-identify”, and ‘‘num-ber-of-pages-viewed” between trained and untrained subjectswhen they using the Q-CLD tool. Therefore, the results partiallysupported the third hypothesis and the latter results are in accordwith the previous study (Leman & Lehto, 2003) showing that thereis no significant difference in prediction accuracy among the sub-ject groups.
When interpreting the time result results, it should also benoted that the accuracy of predicted defects by the fuzzy Bayesianmodel using Korean was slightly less accurate than for the modelusing English. It seems reasonable that if the accuracy of the fuzzyBayesian model using Korean was improved after further develop-ment to the same level as a model using English, the user perfor-mance and preference for the Q-CLD tool will be further improved.
5. Conclusions and further study
In many real world systems, decision makers performing diag-nosis and troubleshooting tasks must integrate information fromdistributed information sources. This research showed that aquery-based diagnostic tool based on a Bayesian model may be ahelpful source of decision support in such settings. Query-transla-tion-based information retrieval was particularly helpful for non-native English users.
The development of the Q-CLD tool was prompted by the need ofcompanies for localized websites and previous work showing theeffectiveness of query-based print defect troubleshooting (Leman& Lehto, 2003). Overall, this research shows that the Q-KE-CLD toolwas very helpful for Korean users, especially for novice users. Thisresearch also shows the potential effectiveness of Q-CLD tools fornonnative English users employing a systematic evaluation ap-proach even though more cases are needed to generalize the results.Considering that translations of the content of an entire website areoften not practical to implement (Aykin, 2005), this study also pro-vided a feasible direction for website localization for a print defectdiagnosis using the Q-CLD tool. Additionally, this research showsthat the query-based diagnosis tool was very effective for print qual-ity diagnosis when used along with link-based browsing.
However, the Q-CLD tool had some limitations because infor-mation in query results was still displayed in English. From this
P. Choe et al. / Computers & Industrial Engineering 57 (2009) 37–45 45
respect, the proper display for the Q-CLD system could be furtherimproved. There are several other potential extensions of this re-search in terms of model improvement and generalization. Becausethis research was limited to a Q-KE-CLD, additional research isneeded to generalize the conclusion that Q-CLD tools are helpfulfor nonnative English users. Another observation is that modelaccuracy might be improved using alternative methods such asnaïve Bayes, or multivariate approaches, including Singular ValueDecomposition (SVD-Bayes). The latter methods have potentialadvantages in certain situations (Noorinaeini & Lehto, 2006).Another potential improvement for model accuracy is to betterincorporate semantic and syntactic analyses in the process oftranslation or to consider the sequence of words. Future workshould consider other languages and methods of translation to fur-ther develop the methods studied here. Machine learning for datacollection and the display of query results are other interestingfuture research topics to implement a real Q-CLD system.
Acknowledgements
We thank all professors – Sung Ho Han (POSTECH, Pohang),In-Jae Jeong, (Hanyang University, Seoul), Wook-Gee Lee (KumohNational Institute of Technology in Gumi), and Dongmin Shin,(Hanyang University, Ansan) in South Korea – for their help inthe experiment. We also show our appreciation to Keith Brown,John Taggart, and Paul Turnbull at Hewlett Packard for their sup-port and interest in all aspects of the research.
References
Aykin, N. (2005). Usability and internationalization of information technology. NewJersey: LEA.
Barber, K. S., Han, D. C., & Liu, T. H. (2000). Coordinating distributed decision makingusing reusable interaction specifications. Lecture Notes in Computer Science LNCS(Vol. 1881). Berlin Heidelberg: Springer-Verlag.
Boland, Jr., R. J., Maheshwari, A., Te’eni, D., Schwartz, D. G., & Tenkasi, R. V. (1992).Sharing perspectives in distributed decision making. In Proceedings of the 1992ACM conference on computer-supported cooperative work, Toronto, Ontario,Canada, November, 1992. pp. 306–313.
Brandon, D. Jr., (2001). Localization of web content. Journal of Computing Sciences inColleges, 17(2), 345–358.
Chen, J. (2003). The construction, use, and evaluation of a lexical knowledge base forEnglish–Chinese cross-language information retrieval. Ph.D. thesis. Syracuse, NY:Syracuse University.
Choe, P. (2007). Query translation-based cross-language diagnosis for nonnativeEnglish users. Ph.D. thesis. West Lafayette, IN: Purdue University.
Choe, P., Kim, C., Lehto, M. R., & Allebach, J. P. (2007). Experimental comparison ofadaptive vs. static thumbnail displays HCI International 2007. Lecture Notes inComputer Science (LNCS) 4551, 41–48.
Choe, P., Lehto, M. R., & Allebach, J. P. (2007). Self-help troubleshooting by Q-KE-CLDbased on fuzzy Bayesian model. HCI International 2007, Lecture Notes inComputer Science (LNCS), Vol. 4557, 391–400.
Choe, P., Kim, C., Lehto, M. R., Lehto, X., & Allebach, J. (2006). Evaluating andimproving a self-help technical support website: Use of focus group interviews.International Journal of Human–Computer Interaction, 21(3), 333–354.
Diekema, A. R. (2003). Translation events in cross-language information retrieval:Lexical ambiguity, lexical holes, vocabulary mismatch, and correct translations.Ph.D. thesis. Syracuse, NY: Syracuse University.
Foo, S., Hui, S. C., Leong, P. C., & Liu, S. (2000). An integrated help desk support forcustomer services over the World Wide Web – A case study. Computers inIndustry, 41, 129–145.
Jennings, N. R. (1966). Coordination techniques for distributed artificial intelligence.In Foundations of distributed artificial intelligence (pp. 187–210). New York: JohnWiley & Sons, Inc..
Kim, C., Choe, P., Lehto, M. R., & Allebach, J. P. (2005). Development of a web-basedinteractive self-help troubleshooting tool for print quality problems. InInternational conference on human–computer interaction, Las Vegas, NevadaUSA, 22–27 July, 2005.
Kim, C., Choe, P., Lehto, M. R., & Allebach, J. P. (2007). Effect of providing a web-based collaboration medium for remote customer troubleshooting tasks. HCIInternational 2007, Lecture Notes in Computer Science (LNCS). Vol. 4558, pp. 47–53.
Kraemer, K. L., & King, J. L. (1988). Computer-based systems for cooperative workand group decision making. ACM Computing Surveys, 20(2), 115–146.
Lee, K., Kageura, K., & Choi, K. (2004). Implicit ambiguity resolution usingincremental clustering in cross-language information retrieval. Informationprocessing and management: an international journal (Vol. 40(1), pp. 145–159).Berlin: Springer.
Lehto, M. R. (2004). TextMiner manual. Ann Arbor, MI: Consumer Research, Inc.Leman, S. (2001). Decision support system to predict print quality. Master thesis. West
Lafayette, IN: Purdue University.Leman, S., & Lehto, M. (2003). Interactive decision support system to predict print
quality. Ergonomics, 46(1–3), 52–67.Ma, L. (2002). Search tools for internet2. Ph.D. thesis. West Lafayette, IN: Purdue
University.Mitkov, R. (2003). The Oxford handbook of computational linguistics. Oxford; New
York: Oxford University Press.Montgomery, D. C. (2001). Design and analysis of experiments (5th ed.). New York:
Johan Wiley & Sons, Inc. pp. 107–109.Noorinaeini, A., & Lehto, M. (2006). Hybrid singular value decomposition; a model
of human text classification. International Journal of Human Factors Modeling andSimulation, 1(1), 95–118.
Olston, C., & Chi, E. H. (2003). ScentTrails: Integrating browsing and searching onthe web. ACM Transaction on Computer–Human Interaction, 10(3), 177–197.
Park, H. J., Santos, H., Kim, C., Choe, P., Kumontoy, R., Low, K., et al. (2006). A web-based self-diagnosis tool to solve print quality issues (Focal paper). InProceedings of NIP22 22nd international conference on digital printingtechnologies. Denver, CO, 17–22 September 2006.
Petrelli, D., Hansen, P., Beaulieu, M., & Sanderson, M. (2002). User requirementelicitation for cross-language information retrieval. The new review ofinformation behaviour research-studies of information seeking in context, 3.
Pinson, S. D., Loucã, J. A., & Poraitis, P. (1997). A distributed decision support systemfor strategic planning. Decision Support Systems, 20(1), 35–51.
Salton, G. (1970). Automatic processing of foreign language documents. Journal ofthe American Society for Information Science, 21, 187–194.
Salton, G. (1973). Experiments in multi-lingual information retrieval. InformationProcessing Letters, 2(1), 6–11.
Shneiderman, B., Byrd, D., & Croft, W. B. (1997), Clarifying search: A user-interfaceframework for Text Searches. In D-Lib magazine, January, 1997. Accessed onlineon May 6, 2006 at: http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html.
Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimentaldesign (3rd ed.). McGraw-Hill. pp. 702–706.