40
Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA [email protected]

Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA [email protected]

Embed Size (px)

Citation preview

Page 1: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Practical Free-text Plagiarism Investigation

Fintan CulwinSchool of Computing

London South Bank University London SE1 [email protected]

Page 2: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

On 18th August, the JISC plagiarism service launched an improved version, addressing some of the points that are raised in these notes!

On the same day all SBU servers were taken off-line due to the slammer worm!!

Some of the comments on the JISC service made may have been addressed by the improvements! These notes and the JISC brochure refer largely to the old version, comments on the new version are shown in red.

Important Disclaimer(?)

Page 3: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

This tutorial will introduce the processes involved in using the JISC service and give examples of its use from the 300 plus final year computing and BIT projects that were processed in 2003. The limitations of the service as discovered will be explored and the design and implementation of additional tools needed to complement the JISC service will be presented.

Free text plagiarism is a large and growing problem. Tools to assist with its detection are sadly necessary but unfortunately not sufficient. Some of the reasons why some students resort to cheating will be explored and some of the pedagogic responses that can possibly forestall it will be presented.

General Description, as advertised

Page 4: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Following the tutorial attendees will have knowledge of:

The nature and limitations of the JISC plagiarism detection service.The operation of the JISC service.Interpretation of the results of the JISC service and use of the JiscView utility.The use of the OrCheck tool to follow up a JISC investigation. The use of the Praise tool to detect intra-corporal plagiarism.The use of Freestyler to investigate single documents.

Specific Learning Objectives

Page 5: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Much of the background and experience for this tutorial results from processing over 300 final year projects through the JISC system in the summer of 2003.

Additionally, a number of utilities and systems have been developed at SBU for free text originality investigation.

Previously, experience of developing and operating source code detection systems since circa. 1992. This led to the JISC commissioned report on source code plagiarism detection.

My Qualifications

Page 6: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

JISC plagiarism detection service using iParadigms (aka TurnItIn) technology. Free to UK institutions for at least the next year.

Available Services

UKRUND, originally a Swedish service now based in Brussels (awaiting evaluation).

CopyCatch, desktop intra-corporal system, now free of charge.

OrCheck, (also PRAISE, VAST & FreeStyler) free of charge from SBU.

Various other systems with varying degrees of capability and availability (FindSame, HowOriginal etc.).

Page 7: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Classification space

intra-corporal extra-corporal

document corpa

desktop server

commercial free

database stylistics

text-only styled documents

open proprietary

in house remote

Page 8: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Why do students cheat?because the task they have been set it too difficult for them

because they are not capable of doing the task set

because they are capable but not sufficiently organised

because they are capable but want a better mark

because everyone else is cheating

because cheating has become a habit

because they do not agree that they are cheating

because the resources required are not available

because the tutor connives with the cheating

because they are not prepared to devote the amount of time the task would take

because the number of assessment tasks set is unreasonable

because they have devoted the time and feel they deserve the mark

because their families want them to get a better mark

because the institution is inhumane

Page 9: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Because the perceived chances of being caught and the perceived punishment if caught are less than than the perceived benefit of cheating, at the time when the cheating occurs.

essentially . . .

Page 10: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

JISC Plagiarism Report

“. . . technology can only assist us, it will never replace the expertise of humans ... the answer to problems usually lies in process and procedures not technology alone. Electronic detection has its place in institutions but the real solutions lie in appropriate assessment mechanisms, supportive institutional culture, clear definitions of plagiarism and policies for dealing with it and adequate training for staff and students. If these areas are improved, the need, desire, and appeal of plagiarism can be taken away for most students."

Page 11: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Implications for Practice

change the assignment specification for every presentation

assess process as well as product

assess at a higher level (of Bloom’s taxonomy)

individualise assessment tasks

it is your responsibility to educate your registrar about the exact nature of academic misconduct

it is your responsibility to educate your students about the boundaries between cooperation, collusion and copying

it is your responsibility to ensure that an average student can complete an assessed task in a reasonable time

participate in groupworks

innovate assessment techniques

Page 12: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

4 Stage Process

collection detection

confirmationinvestigation

Page 13: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Detection is (always?) capricious

Source : Downloading Detectives, Satterwhite & Gerein

Search/Service n found (of 146) % foundTurnItIn 85 58Google 76 52Metacrawler 65 44Paperbin 63 43Altavista 53 36Findsame 38 24EVE 30 20HowOriginal 30 20

None located a verbatim passage from the on-line Encyclopedia Britannica.

Page 14: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Capricion in Practice3% as reported by the JISC service.

8 1/2% following tutor manual Google search.

9% following OrCheck on Ch4.

?~11% following fullOrCheck investigation???

Page 15: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Ordered Originality List

not in order within bands

Page 16: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Originality Report

Page 17: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

All ~315 projects were submitted to the JISC system (mostly by the students but some manually from our in-house submission system).

In additional first and second markers were asked to flag any that they thought suspicious (more capricion?).

Flagged reports were investigated using OrCheck and a number that had been reported comaratively clean by JISC were shown to be significantly non-original.

Some that remained suspicious still reported clean (one was adjudged suitable for non-evidential investigation but dropped for lack of resuources). About 50 originality reports were visually examined and a number cleared (excessively long cited appendixes and common JavaScript in technical appendixes).

Final Year Project 2003 - 1

Page 18: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

About 20 reports were categorised as ‘extensive’, ‘substantive’, or ‘significant’. Summary notes were made onall of these and JiscView and/or OrCheck visualisations produced.

The project panel decided to proceed with the 9 ‘extensive’ and ‘substantive’ cases.

First supervisors (some who should have known better!) were prepared to excuse extensive (~50%) demonstrated non-originality and/or suggested informal capping.

Of the 9 cases processed formally, penalties ranged from cancellation of all level 3 marks (and award of DipHE), cancellation of the project mark (and award of unclassified), cancellation of the project mark (but allowed to resubmit next year).

Final Year Project 2003 - 2

Page 19: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Quantitiative Corpa Analysis

Final Year Project Non-Originality

0%

10%

20%

30%

40%

50%

60%

0 100 200 300

num ber of students

no

n o

rig

inal

ity

‘ColdFusion’ area

Hypothesised ‘real’ line

Page 20: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Revised Service - Ordered List

Page 21: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Revised Service - Originality Report

Page 22: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Revised Service - Side by Side Comparison

the two panes are not hyperlinked

Page 23: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Comments on the JISC service 1The nature of the detection engine is unknown (although guesses can be made).

It is (necessarily) administratively cumbersome.

There is no facility for batch enrolment of students onto the system. (Possibly addressed.)

There is no batch submission of documents (although a tutor can submit on behalf of a student). (Possibly addressed.)

There is no facility for batch downloading. (I had to manually review about 50 originality reports over a weekend and had to obtain each one individually to take them home.)

There is no batch submission of additional URLs each has to be submitted individually, (with a re-analysis after each one).

The four hour turnaround on reanalysis of a document made semi-manual investigation cumbersome. (Addressed in the upgrade.)

Page 24: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Comments on the JISC service 2There is no facility to integrate it with WebCT or BlackBoard.

The system has some aspects of a MLE (e.g peer review, on-line grading). (Not in the JISC version.)

The precise quantitative degree of similarity is not stated or used to precisely order the list . (Possibly addressed.)

There is no side by side comparison of submission and hit(s). (Addressed in the upgrade.)

The significant and extent of the non-originality within the document can be unclear, particularly with large documents. (See the JiscView utility.) The system can lose some hits (i.e. a hit reported may disappear if a reanalysis of the document is requested). (Addressed in the upgrade.)

There is no management reporting capability. (e.g. a convenient printer friendly list of all submissions received, etc.)

Page 25: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Comments on the JISC service 3

The sensitivity of the detection cannot be controlled (e.g. only consider runs of n words, exclude this domain, exclude anything from this base document, exclude hits below n%).

The use of red highlighting confused some tutors (they assumed that it was more significant than other colors). (A lesson for all tool designers!)

The submission of styled documents (RTF *.DOC etc.) can be impacted by firewalls and congestion. (Inevitable with any such system and large documents.)

There is no facility to exclude non-discursive content or appendixes. (Many of the 15% hits reported were due to JavaScript in technical appendixes supplied by tools such as ColdFusion.)

The ‘open’ use of the system, where students can view the originality reports, may mislead students (and tutors?!) regarding the true nature of the document.

Page 26: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

JiscView

The JISC textual representations, whilst adequate for small documents, proved less useful for large projects. The colour coding did not give a precise quantitative measure and the relative location of the various non-original parts was also unclear.

To address these problems a small utility, JiscView, was developed to provide a high level, non-interactive, ‘map’ of a JISC non-originality report.

The utility may have been invalidated by the revised JISC service. It is only available upon request with many caveats and no documentation.

Page 27: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

JiscView in Operation

A JiscView image contains one pixel for every character, colour coded as in the originality report. The width is arbitrary (just wide enough to accommodate the text at the top). It gives a precise quantitative measure of non-originality, in this case 24%.

Page 28: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

OrCheck

OriginalityChecker is an in-house, desktop, single-document, free-of-charge, database (Google) driven, text only, non-proprietary tool.

Essentially, it provides some assistance with the process of manually performing a Google driven keyword search and (in particular) with interpreting the extent and significance of any matches in the documents returned.

In the final year project investigation it was used to locate URLs to manually feed into the JISC service.

It was also used in ‘passive’ mode to prepare evidential reports for the investigation phase.

Page 29: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

document loaded concordance generated

OrCheck in Operation 1

Page 30: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

search in progress hits obtained

OrCheck in Operation 2

Page 31: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

OrCheck in Operation 3

textual comparison graphical representation

Page 32: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

PRAISE

Prioritised Ring to Assist In Similarity Evaluationsis an in-house, desktop, intra-corporal, free-of-charge, stylistic, (text only), non-proprietary tool.

It is used to detect and display the degree of similarity between the documents in a corpus.

Although designed for text-only use it will operate upon styled texts (though its behaviour is somewhat unknown).

It uses the words2 metric, shown from Thomas Lancaster’s - thesis to be efficient and effective.

It is intended to allow an OrCheck and/or VAST viewer to be spawned from it for detailed investigation.

Page 33: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

PRAISE in Operation 1

Page 34: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

PRAISE in Operation 2

Page 35: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

PRAISE in Operation 3

The documents are arranged on the torc in gross similarity sequence. Controls are provided to vary the number of documents and the degree of similarity shown.

When one document is selected all other documents linked to it, at or above the similarity level are also shown. (From here an OrCheck visualisation will be launched).

When two documents (i.e. one link) are selected details of that degree of similarity are shown. (From here a VAST visualisation will be launched)

An alternative tabular view of the information also needs to be provided.

Extra-corporal Web sourced documents can be included and are shown in a different colour. (An OrCheck style capability to obtain such

documents needs to be included.)

Page 36: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

VAST

Visual Analysis of Similarity Tool is an in-house, desktop, double-document, free-of-charge, stylistic driven, text only, non-proprietary tool.

It provides a detailed OrCheck like visualisation and investigation of a pair of documents.

VAST is more capable of fuzzy matching than OrCheck and so is more capable of detecting similarity beneath superficial disguises. However it is less precise in its highlighting and is unable to give a (precise) quantitative value to the similarity.

VAST can also be used to track changes in the drafts of a document.

Page 37: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

VAST in Operation

Page 38: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

FreeStyler

FreeStyler is an in-house, desktop, single-document, free-of-charge, stylistic, text only, non-proprietary tool.

It provides rolling-average, interactive graphs of various stylistic measurements. The intention is that if there is more than one ‘voice’ in a document, the differences should become visible in the graphs. (In practice this has not proved to be so easy!).

FreeStyler can also be used as a writing tool (checking reading age across a document, ensuring consistency of voice and spelling conventions etc.).

Page 39: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

FreeStylerIII in Operation

Page 40: Practical Free-text Plagiarism Investigation Fintan Culwin School of Computing London South Bank University London SE1 0AA fintan@sbu.ac.uk

Inform students clearly and demonstrate the technology at the first project lecture (as was done in 2002/3).

Have students sign and return the JISC DPA form as part of project registration.

Encourage final year core unit tutors to use the JISC service routinely.

Require students to submit the body of the report (only) to JISC, but to submit the full report in-house.

Staff development and clear agreed guidelines to all tutors regarding the significance of non-originality.

Have agreed time relief for coordinating the systems and advising on issues.

Final Year Project 2004