Upload
earl-pitts
View
233
Download
0
Tags:
Embed Size (px)
Citation preview
How to remove an out layer tester
Lucjan Janowski
Faculty of Electrical Engineering, Automatics, Computer Science and ElectronicsDepartment of Telecommunications
2
Agenda
• Can a tester be an out layer?• The detecting philosophy• Latent variables• Rasch model• WinSteps• The final decision• Conclusion
2008 I 05-07
3
Can a tester be an out layer?
2008 I 05-07
4
What would we like to model?
• Why do we use testers?• A tester represents human
perception that is difficult to model • People are different and so are our
users/clients. Our goal is to take such difference into account
• Some of us are critical and others are uncritical
• A tester can be tired or not focused enough and therefore his/her answer can be random
2008 I 05-07
5
A tired tester problem
• A user can be tired too. Should we remove all tired testers?
• Can a tester score randomly? What are the consequences?
• Note that detecting that a tester scores a picture differently than the average score does not mean that it is a random tester
• We have to be very careful with testers removal since our goal is to build a model of the average user not the proper user
2008 I 05-07
6
Why are some scores different?
• Different effects can affect tester’s judgement differently (e.g. motion intensity, color, etc.)
• Testers have different experience (e.g. watching mainly youtube or films on a DVD set)
• Each of us is more or less critic to anything that he/she judges
• The words describing the opinion scale can be understood differently (in Poland OK is good in England OK is fair)
2008 I 05-07
7
What can we do?
• We have to detect random scores• A tester that scores randomly often
should be removed from the model building
• An answer that differs from the average score is not necessarily a random one therefore we have to consider the average score but corrected by a tester individualism
• We need a mathematic model of a user behavior that takes into account those properties
2008 I 05-07
8
Latent variable
OS
This is what a tester sees
Any distortion that influences QoE
2008 I 05-07
9
Latent variable
OS
Latent variable
This is what a tester sees
Any distortion that influences QoE
2008 I 05-07
10
Latent variable manifestation
2008 I 05-07
5 4 3 2 1
5 4 3 2 1
5 4 3 2 1
5 4 3 2 1
11
An example
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
12
Non extreme values testers
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
13
Wide range for 10 and 1
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
14
Critical tester
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
15
Are the answers random?
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
16
Rasch model
• We assume that a latent variable is the variable that is really scored by testers
• We assume that the opinion score probability is a logit function of the model parameters
• The function has parameters describing:– a tester “criticism” factor– a film/picture/… quality– an average threshold value for particular
score
2008 I 05-07
17
Rasch model equation
• n the tester number• i the object number (what is scored)• x the opinion score value (1-5, 0-10, …)
2008 I 05-07
)(
)(
1 xin
xin
e
enix
182008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
n
192008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
n
20
Rasch model
• We assume that Rasch model is correct and the data that do not fit this model are incorrect [sic]
• Note that without any assumption we are not able to detect randomly scoring testers
2008 I 05-07
Data
Model values
Observed values
5
1xnixniE
21
OMS (Outfit Mean Square)
• Knowing the model probability and the user answer we can estimate how far is a tester from the model
• A tester’s accuracy or quality is based on the OMS (Outfit Mean Square)
• Rasch model can be computed by WinSteps software (http://www.winsteps.com/)
• The OMS can be interpreted on the basis of heuristically obtained ranges2008 I 05-07
22
Results interpretation
2008 I 05-07
•A tester is not relevant and he/she should be removed2<OMS
•We should be suspicious 1.5<OMS<
2
•Correct tester0.5<OMS<1.5
•A tester fits the model too well OMS<0.5
23
An example results
2008 I 05-07
Tester IDVideo ID (increasing distortion)
OMS0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1 1.78148 10 9 8 5 4 3 1 3 2 1 1 1.23149 8 10 9 2 7 4 3 1 1 0 1 2.81150 9 9 9 5 6 5 3 2 5 2 2 0.90151 8 7 8 7 6 6 5 2 5 3 2 0.76152 10 9 7 8 7 4 3 3 2 1 1 1.36153 3 6 4 3 3 3 3 3 3 2 1 0.67
24
Rasch model disadvantages
• It is more accurate for more data. It is difficult to have lots of results since the tests are expensive
• Not all type of correct testers’ behavior can be modeled
• The algorithms are not implemented in Matlab therefore it is difficult to implement it in an automatic analysis made in Matlab
2008 I 05-07
25
Conclusion
• A tester’s answers make it possible to model human perception but not all his/her answers are correct
• Out layers should be removed • Rasch model helps to detect not relevant
testers • The final decision should be checked since
not all correct behaviors can be modeled by Rasch model
2008 I 05-07
262008 I 05-07