24
ChatCoder: Toward the ChatCoder: Toward the Tracking and Tracking and Categorization of Internet Categorization of Internet Predators Predators April Kontostathis Lynne Edwards Amanda Leatherman Ursinus College

ChatCoder: Toward the Tracking and Categorization of Internet Predators April Kontostathis Lynne Edwards Amanda Leatherman Ursinus College

Embed Size (px)

Citation preview

ChatCoder: Toward the ChatCoder: Toward the Tracking and Categorization of Tracking and Categorization of Internet PredatorsInternet Predators

April KontostathisLynne Edwards

Amanda Leatherman

Ursinus College

April KontostathisDepartment of Mathematics and Computer Science

Where are we coming Where are we coming from?from?Spring/Summer 2008

◦ Amanda Leatherman, Ursinus class of 2009, approaches Lynne Edwards, Associate Professor of Media and Communication Studies, about a new project.

April KontostathisDepartment of Mathematics and Computer Science

Summer 2009Summer 2009Amanda and Lynne research related workOlson, L. N., Daggs, J. L., Ellevold, B . L.,

& Rogers, T. K. (2007). The communication of deviance: Toward a theory of child sexual predators' luring communication. Communication Theory, 17, 231-251.

Lynne and Amanda channel this project in two directions◦ Modify the theory for the online environment◦ Operationalize the theory

April KontostathisDepartment of Mathematics and Computer Science

Original LCT Model (Olson, Original LCT Model (Olson, et. al)et. al) Gaining Access

Characteristics of the perpetrator Characteristics of the victim Strategic placement

Deceptive Trust Development Grooming

Communicative desensitization Reframing

Isolation Approach

April KontostathisDepartment of Mathematics and Computer Science

ProcessProcessRead many transcripts from Perverted-justice.com

◦ … not an appealing job

April KontostathisDepartment of Mathematics and Computer Science

Meanwhile …Meanwhile …I am planning a Fall 2008

Software Engineering course – looking for projects to assign to students

Lynne asks if my students can build a system to find phrases in the perverted-justice transcripts

… a collaboration is born!

Where are we now?Where are we now?

Revised LCT Model Gaining Access

Strategic Placement

Deceptive Trust Development Activities Compliments Personal Information

Exchange Relationship Exchange

Grooming Communicative

Desensitization Reframing

Isolation Approach

April KontostathisDepartment of Mathematics and Computer Science

Categorization Categorization ExperimentsExperimentsFirst Experiment

◦ Class: {Predator , Victim} 32 instances, 16 in each class (talking to each

other)

◦ Eight numeric attributes - Count of tagged phrases in each category Activities Personal Information Compliments Relationship Reframing Desensitization Isolation Approach

April KontostathisDepartment of Mathematics and Computer Science

ResultsResultsClassifier: C4.5 (J48 in Weka)3-fold cross validationSuccess Rate: 59%

◦baseline 50%Confusion matrix

Classified as Predator

Classified as Victim

8 8 Actual Predator

5 11 Actual Victim

April KontostathisDepartment of Mathematics and Computer Science

Decision TreeDecision TreeDesensitizationCount <= 35

| RelationshipCount <= 0

| | ActivitiesCount <= 1

| | | IsolationCount <= 5: Predator (5.0/1.0)

| | | IsolationCount > 5: Victim (4.0)

| | ActivitiesCount > 1: Predator (2.0)

| RelationshipCount > 0: Victim (10.0)

DesensitizationCount > 35: Predator (11.0/1.0)

Predator vs. Victim Predator vs. Victim PatternsPatterns

April KontostathisDepartment of Mathematics and Computer Science

Categorization Categorization ExperimentsExperimentsSecond Experiment

◦ Class: {PJ , Non-PJ} 31 instances, 14 PJ Transcripts, 15 Non-PJ Non-PJ obtained from Dr. Susan Gauch – collected

during her ChatTrack project PJ transcripts, both Victim and Predator were

coded

◦Same eight attributes

April KontostathisDepartment of Mathematics and Computer Science

ResultsResultsClassifier: C4.5 (J48 in Weka)3-fold cross validationSuccess Rate: 93%

◦baseline 48%Confusion matrix

Classified as Not PJ

Classified as PJ

15 0 Actually Not PJ

2 12 Actually PJ

Non PJ vs. PJNon PJ vs. PJ

April KontostathisDepartment of Mathematics and Computer Science

Clustering ExperimentsClustering ExperimentsAll 288 PJ TranscriptsK Means ClusteringSame eight attributes

◦column normalizedFour Clusters found

◦minimum intra-cluster variation◦multiple runs to avoid local minima

April KontostathisDepartment of Mathematics and Computer Science

Clusters FoundClusters Found

April KontostathisDepartment of Mathematics and Computer Science

Labeling the ClustersLabeling the Clusters60 Transcripts Analyzed CloselyAge Deception Data Categorized

◦ Four distinct ways that deception can be achieved when communicating with others

1. Quantity2. Quality3. Relation4. Manner

McCornack, S.A., Levine, T.R., Solowczuk, K.A., Torres, H.I., & Campbell, D.M. (1992). When the alteration of information is viewed as deception: An empirical test of information manipulation theory. Communication Monographs, 59, 17-29.

Age data captured for all 288 transcripts

April KontostathisDepartment of Mathematics and Computer Science

Age Deception StatisticsAge Deception Statistics

Number of Transcripts Percentage of Transcripts

No discussion of age 3 5%

Honest Predators 36 60%

Deceptive Predators 21 35%

April KontostathisDepartment of Mathematics and Computer Science

Type of DeceptionType of Deception

Quantity manipulation findings Honest predators average real age was 31 yrs old Deceptive predators average real age was 38 yrs old

Quality manipulation findings Average age given by deceptive predators was 27 yrs old

Relation and Manner manipulation findings Rarely used by online sexual predators

April KontostathisDepartment of Mathematics and Computer Science

Age Labeling – a bust Age Labeling – a bust

Cluster Total Honest Percent

C0 70 50 71%

C1 173 112 65%

C2 16 12 75%

C3 27 20 74%

April KontostathisDepartment of Mathematics and Computer Science

Synergistic ActivitiesSynergistic ActivitiesContent Analysis for the Web 2.0

◦ Misbehavior Detection Task Pendar, Nick (2007) "Toward Spotting the Pedophile:

Telling victim from predator in text chats " In The Proceedings of the First IEEE International Conference on Semantic Computing: 235-241. Irvine, California.◦ Study for the Termination of Online Predators (STOP)

Hughes, D., P. Rayson, J. Walkerdine, K. Lee, P. Greenwood, A. Rashid, C. MayChahal, and M. Brennan. 2008. Supporting Law Enforcement in Digital Communities through Natural Language Analysis,. In the proceedings of the 2nd International Workshop on Computational Forensics (IWCF’08). Washington D.C., USA, August 2008.◦ Isis – Protecting Children in Online Social Networks

April KontostathisDepartment of Mathematics and Computer Science

Where are we going?Where are we going?Data remains a big problem

◦ PJ data is problematic◦ Access to large chat or “chat-like”

collections is hard to getLabeling is a bigger problem

◦ Finding predatory chat is a “needle in haystack” problem

Applications are nice, but applications need to be grounded in text mining and communicative theory research.

April KontostathisDepartment of Mathematics and Computer Science

AcknowledgementsAcknowledgementsAmanda LeathermanLynne EdwardsKristina MooreBrian D. Davison and students at Lehigh

Univ.Ursinus College

◦ Media and Communication Studies faculty and students

◦ Mathematics and Computer Science faculty and students

Text Mining Workshop organizers and reviewers

April KontostathisDepartment of Mathematics and Computer Science

Contact InformationContact Information

April KontostathisUrsinus College

[email protected]://webpages.ursinus.edu/akontostathis610-409-3000 x2650