45
Critical issues in the collection, analysis and use of students’ (digital) data By Paul Prinsloo (University of South Africa) Presentation at the Centre for Higher Education Development (CHED), University of Cape Town, Wednesday 8 April 2015 Image credit: http://graffitiwatcher.deviantart.com/art/Big- Brother-is-Watching-173890591

Critical issues in the collection, analysis and use of student (digital) data

  • Upload
    prinsp

  • View
    477

  • Download
    0

Embed Size (px)

Citation preview

Critical issues in the collection, analysis and use of students’ (digital) data

By Paul Prinsloo (University of South Africa)

Presentation at the Centre for Higher Education Development (CHED), University of Cape Town, Wednesday 8 April 2015

Image credit: http://graffitiwatcher.deviantart.com/art/Big-Brother-is-Watching-173890591

ACKNOWLEDGEMENTS

I do not own the copyright of any of the images in this presentation and hereby acknowledge the original copyright and licensing regime of every image and reference used. All the images used in this presentation have been sourced from Google and were labeled for non-commercial reuse.

This work (excluding the images) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Overview of the presentation

• Map the collection, analysis and use of students’ digital data against the backdrop of discourses re surveillance/sousveillance& Big Data/lots of data

• Problematise the collection, analysis and use of student digital data …

• User knowledge and choice in the context of the collection, analysis and use of data

• When our good intentions go wrong…

• Do students know?

• Points of departure

• Implications

• (In)conclusions

Mapping the context…

Image credit: http://en.wikipedia.org/wiki/Mappa_mundi

The collection, analysis and use of students’ digital data in the context of…• Claims that Big Data in higher education will change

everything and that student data are “the new black” and “the new oil”

• Our “quantification fetish”, the “algorithmic turn” and “techno-solutionism” (Morozov, 2013a, 2013b)

• The current meta-narratives of “techno-romanticism” in education (Selwyn, 2014)

• The belief that data is “raw”, “speak for itself” and that collecting even more data equals necessarily results in better understanding and interventions

The collection, analysis and use of students’ digital data in the context of… (2)

• Ever-increasing concerns about surveillance, and new forms of “societies of control” (Deleuze, 1992)

• The “algorithmic turn” and the “alogorithm as institution” (Napoli, 2013)

• A possible “gnoseological turning point” where our belief about what constitutes knowledge is changing and where individuals are reduced to classes and numbers (Totaro & Ninno, 2014). N=all (Lagoze, 2014)

• Claims that “Privacy is dead. Get over it” (Rambam, 2008)

Problematising the collection, analysis and use of student digital

data …

Problematising the collection, analysis and use of student data…• Privacy as concept & as enforceable construct is fragile (Crawford & Schultz,

2014; Prinsloo & Slade, 2015)

• Legal & regulatory frameworks (permanently?) lag behind (Silverman, 2015)

• Consent is more than a binary of opt-in or opt-out (Miyazaki & Fernandez, 2000;

Prinsloo & Slade, 2015)

• Individuals share unprecedented amounts of information but yet, are increasingly concerned about privacy (Murphy, 2014)

• Discrimination is a fundamental building block in the collection, analysis & use of data (Pfeifle, 2014; Tene & Polonetsky, 2014)

• There are increasing concerns re the lack of algorithmic accountability (Diakopoulos, 2014; Pasquale, 2014) & the fracturing of the control zone (Lagoze,

2014)

• There are also concerns about the unintended consequences of the collection, analysis & use of data (Wigan & Clark, 2013)

Mapping the collection, analysis and use of student digital data against the

discourses of surveillance/sousveillance

From surveillance to sousveillance…

Image credit: http://commons.wikimedia.org/wiki/File:SurSousVeillanceByStephanieMannAge6.png

Jennifer Ringely – 1996-2003 – webcam

Source: http://onedio.com/haber/tum-zamanlarin-en-

etkili-ve-onemli-internet-videolari-36465

If I did not share it on Facebook, did it really happen?

We share more than every before, we are watched more than ever before and we watch each other more than ever before…

Privacy in flux…

Surveillance 101

Image credit: http://en.wikipedia.org/wiki/Surveillance

Image source: https://www.mpiwg-berlin.mpg.de/en/news/features/feature14 Copyright could not be established

• 1749 Jacques Francois Gaullauté proposed “le serre-papiers” – The Paperholder – to King Louis the 15th

• One of the first attempts to articulate a new technology of power – one based on traces and archives (Chamayou, nd)

• The stored documents comprised individual reports on each and every citizen of Paris

The technology will allow the sovereign “…to know every inch of the city as well as his own house, he will know more about ordinary citizens than their own neighbours and the people who see them everyday (…) in their mass, copies of these certificates will provide him with an absolute faithful image of the city” (Chamayou, n.d)

The Paperholder – “le serre papiers” (1749

More recently…

“Secrets are lies”

“Sharing is caring”

“Privacy is theft”

(Eggers, 2013, p. 303)

Welcome to “The Circle”

TruYou – “one account, one identity, one password, one payment system, per person. (…) The devices knew where you were… One button for the rest of your life online… Anytime you wanted to see anything, use anything, comment on anything or buy anything, it was one button, one account, everything tied together and trackable and simple…”

(Eggers, 2013, p. 21)

“Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior.”

http://www.hup.harvard.edu/catalog.php?isbn=9780674368279

Mapping the collection, analysis and use of student digital data against the discourses of Big Data/lots of data…

What is Big Data?

• Huge in volume• High in velocity, being created in or near real time• Diverse in variety• Exhaustive in scope• Fine-grained in resolution and uniquely indexical in

identification• Relational in nature• Flexible, holding traits of extensionality (can add new

fields easily) and scalability(can expand in size rapidly)

(Kitchen, 2013, p. 262)

Exploring the differences between Big Data/lots of data… (Lagoze, 2014)

Mayer-Schönberger & Cukier (2013 –• N=all – Big Data as presenting a “complete view” of reality• Big permits us to lessen our desire for exactitude• We need to shed some of our obsession for causality in

exchange for correlations – not necessarily knowing (or caring about the why but focusing on the what

Lots of data – methodological challengesBig Data – epistemological challenges

Big data as cultural, technological, and scholarly phenomenon (Boyd & Crawford, 2012)

Big Data as interplay of

• Technological: maximising computation power and algorithmic accuracy to gather, analyse, link, and compare large data sets

• Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims

• Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of trust, objectivity, and accuracy

(Boyd & Crawford, 2012, p. 663)

Three sources of data

Directed

A digital form of

surveillance

wherein the “gaze

of the technology is

focused on a

person or place by

a human operator”

Automated

Generated as “an

inherent, automatic

function of the device or

system and include

traces …”

Volunteered

“gifted by users and

include interactions

across social media

and the crowdsourcing

of data wherein users

generate data”(emphasis added)

(Kitchen, 2013, pp. 262—263)

Different sources/variety of quality/ integrity of data

Different role-players with different interests• Individuals• Corporates• Governments• Higher education• Data brokers• Fusion centres

Different methods/types of surveillance, harvesting and analysis

Issues re

• Informed consent

• Reuse/contextual

integrity/context

collapse

• Ethics/privacy/

justice/care

The Trinity of Big Data

Adapted & refined from Prinsloo, P. (2014). A brave new world. Presentation at SAAIR, 16-18 October http://www.slideshare.net/prinsp/a-brave-new-world-student-

surveillance-in-higher-education

Image credit: http://commons.wikimedia.org/wiki/File:Red_sandstone_Lattice_piercework,_Qutb_Minar_complex.jpg

Image credits: http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg

“Privacy and big data are simply incompatible and the time has come to reconfigure choices that we made decades ago to enforce constraints”

(Lane, Stodden, Bender & Nissenbaum, 2015, p. xii)

Critical questions for big data – boyd & Crawford (2012)

1. Big data changes the definition of knowledge – “Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves” (Anderson, 2008, in boyd & Crawford, 2012, p. 666)

1. Claims to objectivity and accuracy are misleading – “working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth” (boyd & Crawford, 2012, p. 667). Big Data “enables the practice of apophenia: seeing patterns where none actually exist, simply because enormous quantities of data can offer connections that radiate in all directions” (ibid., p. 668)

Critical questions for big data (2) – boyd & Crawford (2012)

3. Bigger data are not always better data

3. Taken out of context, Big Data loses its meaning – leading to context collapse

3. Just because it is accessible does not make it ethical – the difference in ethical review procedures and overview between research and ‘institutional research’

3. Limited access to Big Data creates new digital divides

User knowledge and choice in the context

of the collection, analysis and use of

data

Image credit: http://www.mailbow.net/eng/blog/opt-in-and-op-out/

“Providing people with notice, access, and the ability to control their data is key to facilitating some autonomy in a world where decisions are increasingly made about them with the use of personal data, automated processes, and clandestine rationales, and where people have minimal abilities to do anything about such decisions”

(Solove, 2013, p. 1899; emphasis added)

Image credit: http://www.mailbow.net/eng/blog/opt-in-and-op-out/

A framework for mapping the collection, use and sharing of personal user information

(Miyazaki & Fernandez, 2000)

Nevercollect or identity users

Users explicitly opting in to have data collected, used and shared

Users explicitly opting out

The constant collection, analysis and sharing of user data with users’ knowledge

The constant collection, analysis and sharing of user data withoutusers’ knowledge

Also see Prinsloo, P., & Slade, S. (2015). Student vulnerability, agency and learning analytics: an exploration. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015

http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final

The constraints of privacy self-management …

• It is almost impossible to comprehend the scope of data collected, analysed and used, the combination with other sources of information, the future uses for historical information and the possibilities of re-identification of de-personalized data

• These various sources of information and combinations of sources start to resemble “electronic collages” and an “elaborate lattice of information networking” (Solove, 2004, p. 3)

• The fragility of consent… what may be innocuous data in one context, may be damning in another

Adapted from Prinsloo, P., & Slade, S. (2015). Student privacy self-management: implications for learning analytics. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015

http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final

When our good intentions go wrong…

Using student data and student vulnerability: between the devil and the deep blue sea?

Students (some more vulnerable

than others)

Generation, harvesting and analysis of data

Our assumptions, selection of data and algorithms

may be ill-defined

Turning ‘pathogenic’ – “a response intended to

ameliorate vulnerability has the paradoxical effect of exacerbating existing

vulnerabilities or generating new ones”

(Mackenzie et al, 2014, p. 9)

Adapted from Prinsloo, P., & Slade, S. (2015). Student vulnerability, agency and learning analytics: an exploration. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015

http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final

So, do students know…?

Do students know/have the right to know… • what data we harvest from them• about the assumptions that guide our algorithms• when we collect data & for what purposes• who will have access to the data (now & later)• how long we will keep the data & for what

purpose & in what format• how will we verify the data & • do they have access to confirm/enrich their

digital profiles…?

Adapted from Prinsloo, P., & Slade, S. (2015). Student privacy self-management: implications for learning analytics. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015

http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final

Do they know?

Do they have the right to know?

Can they opt out and what are the implications if they do/don’t?

Adapted from Prinsloo, P., & Slade, S. (2015). Student privacy self-management: implications for learning analytics. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015

http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final

Points of departure (1)

(Big) data is…

…not an unqualified good (Boyd and Crawford, 2011)

and “raw data is an oxymoron” (Gitelman, 2013) – see

Kitchen, 2014

Technology and specifically the use of data have been

and will always be ideological (Henman, 2004; Selwyn,

2014) and embedded in relations of power (Apple,

2004; Bauman, 2012)

“… ‘educational technology’ needs to be

understood as a knot of social, political,

economic and cultural agendas that are riddled

with complications, contradictions and conflicts”

(Selwyn, 2014, p. 6)

Points of departure (2):

If we accept that

…what are the implications for the

collection, analysis and use of

student data?

Points of departure (3):The (current?)

limitations of our surveillance

• Students’ digital lives are but a minute part of a bigger

whole – but our collection and analysis pretend as if this

minute part represents the whole

• We create smoke and claim we see a fire – so what

does the number of clicks mean?

• We seldom wonder what if our algorithms are wrong,

and what are the long-term implications for students?

What are the implications for the collection, analysis and use of student (digital) data? (Prinsloo & Slade, 2015)

1. The duty of reciprocal care

• Make TOCs as accessible and understandable (the latter may mean longer…)

• Make it clear what data is collected, when, for what purpose, for how long it will be kept and who will have access and under what circumstances

• Provide users access to information and data held about them, to verify and/or question the conclusions drawn, and where necessary, provide context

• Provide access to a neutral ombudsperson(Prinsloo & Slade, 2015)

What are the implications …? (2)

2. The contextual integrity of privacy and data – ensure the contextual integrity and lifespan of personal data. Context matters…

2. Student agency and privacy self-management• The fiduciary duty of higher education implies a social contract of

goodwill and ‘do no harm’• The asymmetrical power relationship between institution and

students necessitates transparency, accountability, access and input/collaboration

• Empower students – digital citizenship/care• The costs and benefits of sharing data with the institution should be

clear• Higher education should not accept a non-response as equal to

opting in…(Prinsloo & Slade, 2015)

What are the implications …? (3)

4. Future direction and reflection• Rethink consent and employ nudges – move away from

thinking just in terms of a binary of opting in or out – but provide a range of choices in specific contexts or needs

• Develop partial privacy self-management – based on context/need/value

• Adjust privacy’s timing and focus - the downstream use of data, the importance of contextual integrity, the lifespan of data

• Moving toward substance over neutrality – blocking troublesome and immoral practices, but also soft, negotiated spaces of reciprocal care

(Prinsloo & Slade, 2015)

Ethical use of Student Data for Learning Analytics Policy

An example of the institutionalisation of thinking about the ethical implications of using student data

Available at: http://www.open.ac.uk/students/charter/essential-documents/ethical-use-student-data-learning-analytics-policy

(In)conclusions“The way forward involves

(1) developing a coherent approach to consent, one that accounts for the social science discoveries about how people make decisions about personal data;

(2) recognising that people can engage in privacy self management only selectively;

(3) adjusting privacy law’s timing to focus on downstream uses; and

(4) developing more substantive privacy rules.

These are enormous challenges, but they must be tackled”

(Solove, 2013)

(In)conclusions

“Technology is neither good or bad; nor is it neutral… technology’s interaction with social ecology is such that technical developments frequently have environmental, social, and human consequences that go far beyond the immediate purposes of the technical devices and practices themselves”

Melvin Kranzberg (1986, p. 545 in boyd & Crawford, 2012, p. 1)

THANK YOU

Paul Prinsloo (Prof)Research Professor in Open Distance Learning (ODL)College of Economic and Management Sciences, Office number 3-15, Club 1, Hazelwood, P O Box 392Unisa, 0003, Republic of South Africa

T: +27 (0) 12 433 4719 (office)T: +27 (0) 82 3954 113 (mobile)

[email protected]: paul.prinsloo59

Personal blog: http://opendistanceteachingandlearning.wordpress.com

Twitter profile: @14prinsp