Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Muthu Kumar Chandrasekaran, Chencan Xu, Pengyu Li, ���Min-Yen Kan, Bernard C.Y. Tan, Kiruthika Ragupathi, & NUS-HCI Group
Instructors, Learners and Machines: ���Learning instructor intervention from
MOOC forums
Slides: bit.ly/kan-gcms15
prologueAndrew Ng’s morning coffee
16 Aug 2015 2GCMS - Min-Yen Kan
16 Aug 2015 3GCMS - Min-Yen Kan Pictures Courtesy: www.ige3.unige.ch, i.livescience.com &
usatcollege.files.wordpress.com
16 Aug 2015 4
1. Learning Instructor Intervention on MOOCsTeachers for Learners
2. Enabling Peer Annotations in MOOCsLearners for Learners
3. Automating Annotations in MOOCsMachine for Learners
GCMS - Min-Yen Kan
1. LEARNING INSTRUCTOR INTERVENTION IN MOOCS
Instructors for Learners
2. Enabling Peer Annotations in MOOCsLearners for Learners
3. Automating Annotations in MOOCsMachine for Learners
16 Aug 2015 5
Chandrasekaran et al. (2015). Learning instructor intervention from MOOC forums: Early Results and Issues. Education Data Mining (EDM '15), Madrid, Spain.
GCMS - Min-Yen Kan
16 Aug 2015 6GCMS - Min-Yen Kan
Deliberate Practice: ���Problems with Scalability
16 Aug 2015 7
MOOCs are at a huge scale and involve distance learning
Discussion forums are respectively massive
We need to do more with the resources we have
GCMS - Min-Yen Kan
Successful Intervention16 Aug 2015 8
GCMS - Min-Yen Kan
Scaling instructor intervention?Instructors cannot reply or even read every post on a MOOC forum
Compelling pedagogical reasons to intervene, – But how much and when to intervene?
We propose a system to identify threads that merit an instructor’s attention!
Practical Outcomes• Forum triage tools• Prescriptive guidelines for intervention
16 Aug 2015 9 GCMS - Min-Yen Kan
Freely Annotated Data!16 Aug 2015 10GCMS - Min-Yen Kan
Corpus
16 Aug 2015 11GCMS - Min-Yen Kan
Forum typeAll Intervened
# threads # posts # threads # postsHomework 3,868 31,255 1,385 6,120
Lecture 2,392 13,185 1,008 3,514
Errata 326 1,045 134 206
Exam 822 6,285 405 1,721
Total 7,408 51,770 2,932 11,561
D14 Corpus
Data from 14 MOOCs (D14) from diverse subject areas with different numbers of threads and interventions.
Feature study done using this corpus.
16 Aug 2015 12 GCMS - Min-Yen Kan
D61 Corpus (scaled up)
Forum typeAll Intervened
# threads # posts # threads # posts
Total 26,643 205,835 7,740 31,779
Data from 61 MOOCs (D61) is about 3 times larger.Our best set of features were tested on D61.
16 Aug 2015 13 GCMS - Min-Yen Kan
Classifier• Logistic regression classifier.
• We use class weights w, to counter balance inherent class imbalance in this data.– Biases prediction towards majority class instances.
• Class weights are learned from the training set by greedily optimising for maximum F1 score.
16 Aug 2015 14 GCMS - Min-Yen Kan
New feature/marker: Forum type
Ratio of intervened to non-intervened threads over D14 across the 4 forum types
Encodes intervention priority as perceived by the instructor.
16 Aug 2015 15 GCMS - Min-Yen Kan
New feature: Entity references to course materials
16 Aug 2015 16 GCMS - Min-Yen Kan
New feature: non-lexical referencesURLs
Timestamps from videos
16 Aug 2015 17 GCMS - Min-Yen Kan
Other features
• Unigrams (~98,000 unique terms)• Thread properties
– Length: as #posts, comment, total; as # sentences.– Structure as average #comments / post.
• Affirmation of the original post by fellow students.
16 Aug 2015 18 GCMS - Min-Yen Kan
Forum type and other features improve significantly over unigrams
# Features Precision Recall F1
1 Unigrams 41.98 61.39 45.58
2 1+forum type 41.36 69.13 48.01
3 2+lexical entity references 41.09 66.57 47.22
4 3+affirmations 41.20 68.94 47.68
5 4+thread_properties 42.99 70.54 48.86
6 5+# of sentences 43.08 69.88 49.77
7 6+non-lexical entity references 42.37 74.11 50.56
8 Ablating entity references 45.96 79.12 54.79
16 Aug 2015 19 GCMS - Min-Yen Kan
CourseIntervention
RatioF1 Individual
(20% test set)F1 D14���
(full course is test set)ml-005 0.45 64.96 56.56
rprog-003 0.32 49.62 48.70
calc1-003 0.60 51.29 68.91
smac-001 0.17 25.00 33.26
compilers-004 0.02 14.28 4.91
maththink-004 0.49 63.56 63.29
medicalneuro-002 0.76 75.36 81.94
musicproduction-006 0.01 0.00 1.03
gametheory2-001 0.19 28.57 30.16
Average 0.36 41.59 45.54
Weighted Macro Avg 0.40 49.04 50.56
16 Aug 2015 20
Predicting interventions is difficult. ���Performance varies widely.
GCMS - Min-Yen Kan
Does scaling up the corpus help?
Varying intervention ratios makes training and test set distributions different
Corpus P R F1
14 MOOCs 45.96 79.12 54.79*
61 MOOCs 42.80 76.29 50.96*
* Uses the best performing feature set from the previous experiment: i.e., all except course refs
16 Aug 2015 21GCMS - Min-Yen Kan
Limitations• Variation among courses on the # of threads• Intervention decision may be subjective• Simple baselines outperform learned models• Previous results are not replicable
16 Aug 2015 22 GCMS - Min-Yen Kan
Diversity across courses
The # of threads and their intervention ratios in forums over D14
Diversity across different courses in volume of threads and interventions
16 Aug 2015 23 GCMS - Min-Yen Kan
Simple baselines work betterCourse
F1Individual courses
(20% test set)F1
@100%R
F1 on D14 (full course is test
set)F1
@100%Rml-005 64.96 63.79 72.35 61.83
rprog-003 49.62 47.39 48.55 49.31 calc1-003 51.29 74.83 70.63 75.33 smac-001 25.00 34.67 34.15 29.28
compilers-004 14.28 3.28 4.82 4.75 maththink-004 63.56 63.08 61.11 65.49
medicalneuro-002 75.36 88.66 78.06 85.67 musicproduction-006 0.00 4.35 1.09 1.72
gametheory2-001 28.57 45.16 27.12 30.56
Average 41.59 46.43 45.18 47.09
Weighted Macro Avg 49.04 51.51 54.79 53.22
16 Aug 2015 24 GCMS - Min-Yen Kan
Is intervention subjective?
Further, indicated by weak human annotator agreement among instructors (k=0.53).
16 Aug 2015 25 GCMS - Min-Yen Kan
16 Aug 2015 26
Photo credits: UCL Institute of Education. ���Used under Creative Commons License
Prefers not to intervene.
Students use the forum for peer learning.
Professor A
GCMS - Min-Yen Kan
16 Aug 2015 27
Photo credits: UCL Institute of Education. ���Used under Creative Commons License
Prefers to intervene as often as possible.
To engage students and correct misconceptions.
Professor BGCMS - Min-Yen Kan
Variables that influence intervention
• Course discipline and topic • Time within the course• Individual Instructor personality• Availability
Working towards best practices for intervention
16 Aug 2015 28GCMS - Min-Yen Kan
Future Work: ���Intervention framework roadmap
Real-time
Re-intervention Role-based
Thread RankingMitigates intervention subjectivity
Makes intervention decision at post-level
Optimises recommendations for instructor / TA
16 Aug 2015 29 GCMS - Min-Yen Kan
Phase 0: Pilot������- Single Course- Create novice annotation guideline- Test expert/novice annotation fidelity
Phase 1: Small ������- NUS MOOC data only- Understand expert/novice differences- Refine novice annotation plans
Phase 2: Medium-scale ���- MOOC Consortium data���spanning many disciplines���- Run full scale novice crowdsourced annotations
Future Work:���Annotation Plan
16 Aug 2015 30GCMS - Min-Yen Kan
Simplified Intervention Typology
Peer Interventions• Feedback Request• Paraphrase• Juxtaposition• Refinement• Clarification• Completion
Instructor Interventions• Justification Request• Extension• Reasoning Critique• Integration / Summing up
16 Aug 2015 31
ReplicableAnnotatable by noviceEnabling implementation /���
model building
Proposed by the team from a framework based on “Measuring the development of features of moral discussion” by M. W. Berkowitz and J. C. Gibbs, 1983, Merrill -Palmer Quarterly, 29, pp. 399-410, further refined by Teasley, 1999.
GCMS - Min-Yen Kan
Novice AnnotationCan novices approximate expert annotation?
– Other studies show mixed results, attributed to various factors
1. Students• Limited scalability, requires in-place annotation
2. Mechanical Turk• Use worldwide source of people’s spare time to annotate• Needs simple instructions that don’t take long to interpret• Must control for cheating
16 Aug 2015 32
Wor
king
tow
ards
GCMS - Min-Yen Kan
2. ENABLING PEER ANNOTATIONS FOR MOOCS
1. Learning Instructor Intervention on MOOCsInstructors for Learners
Learners for Learners
3. Automating Annotations in MOOCsMachine for Learners
16 Aug 2015 33
Monserrat et al. (2014) L.IVE: An Integrated Interactive Vide-based Learning Environment, ACM CHI 2014
GCMS - Min-Yen Kan
Current Platforms:���Separated Learning
16 Aug 2015 34
Forum
Video
Assessment
GCMS - Min-Yen Kan
L.IVE file descriptor16 Aug 2015 35
Outcome: Rich annotation possible by peers or instructors
GCMS - Min-Yen Kan
1. Learning Instructor Intervention in MOOCsInstructors for Learners
2. Enabling Peer Annotations in MOOCsLearners for Learners
Machine for Learners
3. AUTOMATING ���ANNOTATIONS ���IN MOOCS
16 Aug 2015 36
3.1 NoteVideo3.2 Automated Entity Linking
Monserrat et al. (2013) NoteVideo: Facilitating Navigation of Blackboard-style Lecture Videos, ACM CHI 2013, 1139-1148
GCMS - Min-Yen Kan
Distribution of Blackboard Activities16 Aug 2015 37
… In a typical Khan Academy video
GCMS - Min-Yen Kan
16 Aug 2015 38GCMS - Min-Yen Kan
User Study (n=15)16 Aug 2015 39
Significantly better at 3 of 4 tasks Error Distance comparable
GCMS - Min-Yen Kan
Final: Design ImplicationsScrubber: Shows sequence / flow of visual action• Cannot determine information by random access• Small thumbnail• bigger thumbnail = bigger bandwidth
Transcript: Allow search of text not easily identifiable in visual objects• Only highlights hits and still shows unrelated transcript• Mapping between text and visual object can not retrieved in a glance
NoteVideo: Spatial layout of visual objects that facilitates random access• Sequence of play not always clear• Difficult to find information if there is no clear visual cue
16 Aug 2015 40GCMS - Min-Yen Kan
1. Learning Instructor Intervention in MOOCsInstructors for Learners
2. Enabling Peer Annotations in MOOCsLearners for Learners
Machine for Learners
3. AUTOMATING ���ANNOTATIONS ���IN MOOCS
16 Aug 2015 41
3.1 NoteVideo3.2 Automated Entity Linking
GCMS - Min-Yen Kan
Automatic Entity Linking
Appropriate section of “Module 3, Slide 5”
System added href
Could be done:- As a post-process- As the original poster is writing the
post
4216 Aug 2015 GCMS - Min-Yen Kan
Problem StatementMention
recognition
Unique identifier scheme
Scheme resolution
Identify concrete entity mentions that appear in MOOC forums.
43
Add hyperlinks to a mentions using a designed scheme, which needs to be transparent and readable to humans.
Resolve a scheme instance to find the actual URL of the entity.
16 Aug 2015 GCMS - Min-Yen Kan
Problem 7.8 quiz 3 module 13 slide 5Problem of overfitting the video recommended by Prof
Problems mentioned in last class
Current: Single Concrete Instances44
We currently identify single, concrete, within-course entities (SCI)Examples:
Four main SCI entities:1. Problem – a problem within a problem set, such as ���
Practice problem 7.68, problem7.7 of text, Problem 3 of Quiz 1.2. Quiz – a certain course quiz, such as Quiz 1, Quiz 2, Week3 quiz.3. Lecture – a certain course lecture, such as ���
Module 3, lecture 5, module23.4. Slide – a course slide, such as slide 5, slide 10, slide 11.
16 Aug 2015
✓ ✗
✓ ✓ ✓
✗ ✗
GCMS - Min-Yen Kan
Preliminary statistics
CourseReg Exp matches
# manually checked
# verified correct
3d-motion 19 19 19acoustics1-001 19 6 6advancedchemistry-001 58 14 9amnhearth-002 10 5 5analyze-001 113 26 24apstat-001 78 14 14automata-002 111 11 11bioinfomethods2-001 9 6 6vlsicad-002 4 4 4virtualassessment-001 24 7 5
45
We then used simple regular expressions (keyword + number) to match entity mentions. ������The precision was more than 90%.
From our manual annotation of two courses, we find ~20% of posts have entity mentions.
16 Aug 2015 GCMS - Min-Yen Kan
Entity Mention Recognition46
Pattern 1 keyword + number:
QuestionProblem
QuizExam
HomeworkAssignment
WeekModuleVideo
LectureSlide
Pattern 2 lecture name:
16 Aug 2015
Keyw
ord
list
GCMS - Min-Yen Kan
Transparent Scheme Design47
Prefix http://<hostname>/mxr/
Middle coursera/ml-002/
Platform Name
Course ID
Suffix lecture/4 or lecture/supervised_learninglecture/3/section/3
lecture/3/slide or lecture/4/section/3/slidelecture/3/slide/19
quiz/3, lecture/4/quizquiz/3/question/4, lecture/4/quiz/question/5
16 Aug 2015
Should be guessableby users
Similar to bootstrapping conventions in #hashtags:e.g. #lecture5
Scheme still in progress
GCMS - Min-Yen Kan
Scheme Resolution 48
Transform FunctionDesigned
schemeActual URL
A snapshot of the HTML source in Coursera
1. Automated analysis the web structure and extract the actual URL2. Crowdsource the resolution from students
16 Aug 2015 GCMS - Min-Yen Kan
Delivery by Browser Extension49
14
4
http://wing.comp.nus.edu.sg/mxr/coursera/ml/���
lecture/14/section/4
16 Aug 2015
Options:Hyperlink, ���Sidebar, ���Below post preview
GCMS - Min-Yen Kan
Future Work – Scaling Up1. Larger scale annotation / resolution2. Investigate mention variation and ambiguity3. Adapt to MOOC webpage design changes
4. Finer grained alignment
5. Integration with manual ���annotation tools
16 Aug 2015 50GCMS - Min-Yen Kan
Finer Granularity – Content Based Alignment
5116 Aug 2015 GCMS - Min-Yen Kan
epilogueConclusion / Calling for MOOC Data Consortium Partners
16 Aug 2015 52GCMS - Min-Yen Kan
Slides: bit.ly/kan-gcms15
Email : [email protected] ���
Website:
wing.comp.nus.edu.sg/downloads/moocdata
The MOOC Data Consortium:Enabling reproducible large-scale research
Coursera has given their official support and recognition���
���For researchers needing to
study and replicate prior work
Coursera’s Statement of Support
“As a platform for delivering world-class education and advancing the frontiers of online pedagogy, Coursera encourages the use of its platform to facilitate novel research across a broad range of disciplines, while concurrently protecting the privacy of learners. We support the described research focusing on forum activity and the proposal that this research span courses from across our partner institutions.”
16 Aug 2015 53GCMS - Min-Yen KanSlides: bit.ly/kan-gcms15
Conclusion: ���Instructors, Learners, MachinesLearning at scale means understanding individual courses, quirks
– Non-reproducibility of results - a key issue stalling MOOC research
#convention before (system learned) customization
Rich Interlinking of resources– Annotated by learners as well as machines
16 Aug 2015 54
Publications:• Chandrasekaran et al. (2015). Learning instructor intervention from MOOC forums: Early Results and Issues.
Education Data Mining (EDM '15), Madrid, Spain. • Monserrat et al. (2014) L.IVE: An Integrated Interactive Vide-based Learning Environment, ACM CHI 2014• Monserrat et al. (2013) NoteVideo: Facilitating Navigation of Blackboard-style Lecture Videos, ACM CHI 2013,
1139-1148
GCMS - Min-Yen KanSlides: bit.ly/kan-gcms15