Big Data in Learning Analytics - Analytics for Everyday Learning

Backup

Big Data in Learning Analytics –Analytics for Everyday Learning

Stefan Dietze, L3S Research Center, Hannover

24.01.2017

LearnTec 2017, Karlsruhe

23/02/17 1Stefan Dietze

Research areas

Web science, Information Retrieval, Semantic Web, Social Web Analytics, Knowledge Discovery, Human Computation

Interdisciplinary application areas: digital humanities, TEL/education, Web archiving, mobility

Some projects

L3S Research Center


http://l3s.de/

http://stefandietze.net/

Technology-enhanced Learning / Web-based

Learning

Big Data in Learning Analytics? A simplistic perspective


Learning Analytics & Educational Data Mining

Application of data mining techniques to understand learning activities and performance

Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)

Examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB

Near complete research corpus: LAK Dataset (http://lak.linkededucation.org)

Learning Analytics & Knowledge Dataset

Cooperation of

Near-complete Linked Data corpus of Learning Analytics research publications (~ 800, seit 2009)

Dietze, S., Taibi, D., D’Aquin, M., Facilitating

Scientometrics in Learning Analytics and

Educational Data Mining - the LAK Dataset,

Semantic Web Journal, 2017.


http://lak.linkededucation.org/


Learning

Big Data in Learning Analytics? A simplistic Perspective





Examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB

Near complete research corpus: LAK Dataset (http://lak.linkededucation.org)

Broader understanding: informal learning, micro-learning

Research often focused on resources: sharing, reusing, recommendation

Data examples:

„LinkedUp Catalog“: > 50 M resources, 300 M statements

„LRMI/schema.org“: > 45 M quads (Common Crawl 2015)

Big Data? –Depends, but mostly not!(Volume?)

LinkedUp Catalog of learning resources

Dataset

Catalog/Registry

http://data.linkededucation.org/linkedup/catalog/

“LinkedUp” (FP7 project): L3S, OU, OKFN, Elsevier, Exact Learning Solutions

Publishing and curation of educational/learning resources according to Linked Data principles

Largest collection of Linked Data about learning resources (approx. 50 datasets, 50 M resources)


1

10

100

1000

10000

100000

1000000

10000000

1 51 101 151 201

cou

nt

(lo

g)

PLD (ranked)

# entities # statements

Learning Resources annotations on the Web?

“Learning Resources Metadata Intiative (LRMI)”: schema.org vocabulary for annotation of learning resources in Web documents (schema.org etc)

Approx. 5000 PLDs in “Common Crawl” (2 bn Web documents)

LRMI-Adaptation on the Web (WDC) [LILE16]:

2015: 44.108.511 quads, 6.243.721 resources

2014: 30.599.024 quads, 4.182.541 resources

2013: 10.636873 quads, 1.461.093 resources

23/02/17 7

Power law distribution across providers

4805 Providers / PLDs

Taibi, D., Dietze, S., Towards embedded markup of learning resourceson the Web: a quantitative Analysis of LRMI Terms Usage, inCompanion Publication of the IW3C2 WWW 2016 Conference, IW3C22016, Montreal, Canada, April 11, 2016

Stefan Dietze, Besnik Fetahu, Ujwal Gadiraju

http://lrmi.itd.cnr.it/


Learning

Big Data in Learning Analytics? A simplistic Perspective




Complete research corpus: LAK Dataset (http://lak.linkededucation.org)

Data examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB

Broader understanding: informal learning, micro-learning

Research focused on resources: sharing, reusing, recommendation

Data examples:

„LinkedUp Catalog“: > 50 M resources, 300 M statements

„LRMI/schema.org“: > 45 M quads (Common Crawl 2015)

Big Data? –Depends, but mostly not!(Volume?)

Big Data? –Depends, but mostly not!(Velocity?)


23/02/17 9

(Informal) Learning on the Web ?

Stefan Dietze

Anything can be a learning resource

The activity makes the difference (not the resource): i.e. how a resource is being used

Learning Analytics in online/non-learning environments?

o Activity streams,

o Social graphs (and their evolution),

o Behavioural traces (mouse movements, keystrokes)

o ...

Research challenges:

o How to detect „learning“?

o How to detect learning-specific notions such as „competences“, „learning performance“ etc?

23/02/17 10

„AFEL – Analytics for Everyday (Online) Learning“

Stefan Dietze

Examples of AFEL data sources:

• Activity streams and behavioral traces

• L3S Twitter Crawl: 6 bn tweets

• Common Crawl (2015): 2 bn documents

• Web Data Commons (2015): 1 TB = 24 bn quads

• „German Academic Web“: 6 TB Web crawl (quarterly recrawled)

• Wikipedia edit history: 3 M edits/month (engl.)

• ....

H2020 project (since 12/2015) aimed at understanding/supporting learning in social Web environments

Big Data Challenges/Tasks in AFEL & beyond: some examples


I Efficient data capture

Crawling & extracting activity data

Crawling, extracting and indexing learning resources (eg Common Crawl)

II Efficient data analysis

Understanding learning resources: entity extraction & clustering on large Web crawls of resources

“Search as learning”: detecting learning in heterogeneous search query logs & click streams

Detecting learning activities: detection of learning pattern (eg competent behavior) in absence of learning objectives & assessments (!)

o Obtaining performance indicators from behavioral traces?

o Quasi experiments in crowdsourcing platforms to obtain training data

Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 –Jul/Aug 2015.

Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea.

Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 –Jul/Aug 2015.

Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea.


Detecting competence in online users?

Capturing assessment data: microtasks in Crowdflower

“Content Creation (CC)”: transcription of captchas

“Information Finding (IF)”: middle name of famous persons

1800 assessments: 2 tasks * 3 durations * 3 difficulty levels * 100 users (per assessment)

Level 1„Daniel Craig“

Level 2„George Lucas“ (profession: Archbishop)

Level 3„Brian Smith“ (profession: Ice Hockey, born: 1972)

Behavioral Traces: keystrokes- and mouse movements

timeBeforeInput, timeBeforeClick

tabSwitchFreq

windowToggleFreq

openNewTabFreq

WindowFocusFrequency

totalMouseMovements

scrollUpFreq, scrollDownFreq

….

Total amount of events: 893.285 (CC Tasks), 736.664 (IF Tasks)

Find the middle name of:


Predicting competence from behavioural traces?

Training data

Manual annotation of 1800 assessments

Performance types [CHI15]:

o “Competent Worker” ,

o “Diligent Worker”

o “Fast Deceiver”

o “Incompetent Worker”

o “Rule Breaker”

o “Smart Deceiver”

o “Sloppy Worker”

Prediction of performance types from behavioral traces?

Predicting learner types from behavioral traces

“Random Forest Classifier” (per task)

10-fold cross validation

Prediction performance: Accuracy, F-Measure

Results

Longer assessments more signals

Simpler assessments more conclusive signals

“Competent Workers” (CW, DW): accuracy of 91% respectively 87%

Most significant features: “TotalTime”, “TippingPoint”, “MouseMovementFrequency”, “WindowFocusFrequency”


Other features to predict competence in learning/assessments?

“Dunning-Kruger Effect”

Incompetence in task/domain reduces capacity to recognice/assess own incompetence

Research question

Self-assessment as indicator for competence?

Results

Self-assessment as reliable indicator of competence (94% accuracy), superior to mere performance measurement

Tendency to over-estimated own competence increases with increasing difficulty level

David Dunning. 2011. The Dunning-Kruger Effect: On Being Ignorant of One’s Own Ignorance. Advances in experimental social psychology 44 (2011), 247.

Performance („Accuracy“) of users classified as „competent“


Summary & outlook

Learning analytics in online & Web-based settings

o Detection of learning & learning-related notions in absence of assessment/performance indicators?

o Analysis of range of data, including behavioral traces, activity streams, self assessment etc

o Actual big data

Positive results from initial models and classifiers

Application of developed models and classifiers in online (learning) environments (e.g. AFEL Projekt)

o GNOSS/Didactalia (200.000 users)

o LearnWeb

o Deutsche Welle online

o …

Acknowledgements: Team


Pavlos Fafalios (L3S)

Besnik Fetahu (L3S)

Ujwal Gadiraju (L3S)

Eelco Herder (L3S)

Ivana Marenzi (L3S)

Ran Yu (L3S)

Pracheta Sahoo (L3S, IIT India)

Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)

Mathieu d‘Aquin (The Open University, UK)

Davide Taibi (CNR, Italy)

...

Acknowledgements: Team


Pavlos Fafalios (L3S)

Besnik Fetahu (L3S)

Ujwal Gadiraju (L3S)

Eelco Herder (L3S)

Ivana Marenzi (L3S)

Ran Yu (L3S)

Pracheta Sahoo (L3S, IIT India)

Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)

Mathieu d‘Aquin (The Open University, UK)

Davide Taibi (CNR, Italy)

...

?http://stefandietze.net