Upload
deborah-crust
View
226
Download
4
Tags:
Embed Size (px)
Citation preview
ChatCoder: Toward the ChatCoder: Toward the Tracking and Categorization of Tracking and Categorization of Internet PredatorsInternet Predators
April KontostathisLynne Edwards
Amanda Leatherman
Ursinus College
April KontostathisDepartment of Mathematics and Computer Science
Where are we coming Where are we coming from?from?Spring/Summer 2008
◦ Amanda Leatherman, Ursinus class of 2009, approaches Lynne Edwards, Associate Professor of Media and Communication Studies, about a new project.
April KontostathisDepartment of Mathematics and Computer Science
Summer 2009Summer 2009Amanda and Lynne research related workOlson, L. N., Daggs, J. L., Ellevold, B . L.,
& Rogers, T. K. (2007). The communication of deviance: Toward a theory of child sexual predators' luring communication. Communication Theory, 17, 231-251.
Lynne and Amanda channel this project in two directions◦ Modify the theory for the online environment◦ Operationalize the theory
April KontostathisDepartment of Mathematics and Computer Science
Original LCT Model (Olson, Original LCT Model (Olson, et. al)et. al) Gaining Access
Characteristics of the perpetrator Characteristics of the victim Strategic placement
Deceptive Trust Development Grooming
Communicative desensitization Reframing
Isolation Approach
April KontostathisDepartment of Mathematics and Computer Science
ProcessProcessRead many transcripts from Perverted-justice.com
◦ … not an appealing job
April KontostathisDepartment of Mathematics and Computer Science
Meanwhile …Meanwhile …I am planning a Fall 2008
Software Engineering course – looking for projects to assign to students
Lynne asks if my students can build a system to find phrases in the perverted-justice transcripts
… a collaboration is born!
Where are we now?Where are we now?
Revised LCT Model Gaining Access
Strategic Placement
Deceptive Trust Development Activities Compliments Personal Information
Exchange Relationship Exchange
Grooming Communicative
Desensitization Reframing
Isolation Approach
April KontostathisDepartment of Mathematics and Computer Science
Categorization Categorization ExperimentsExperimentsFirst Experiment
◦ Class: {Predator , Victim} 32 instances, 16 in each class (talking to each
other)
◦ Eight numeric attributes - Count of tagged phrases in each category Activities Personal Information Compliments Relationship Reframing Desensitization Isolation Approach
April KontostathisDepartment of Mathematics and Computer Science
ResultsResultsClassifier: C4.5 (J48 in Weka)3-fold cross validationSuccess Rate: 59%
◦baseline 50%Confusion matrix
Classified as Predator
Classified as Victim
8 8 Actual Predator
5 11 Actual Victim
April KontostathisDepartment of Mathematics and Computer Science
Decision TreeDecision TreeDesensitizationCount <= 35
| RelationshipCount <= 0
| | ActivitiesCount <= 1
| | | IsolationCount <= 5: Predator (5.0/1.0)
| | | IsolationCount > 5: Victim (4.0)
| | ActivitiesCount > 1: Predator (2.0)
| RelationshipCount > 0: Victim (10.0)
DesensitizationCount > 35: Predator (11.0/1.0)
April KontostathisDepartment of Mathematics and Computer Science
Categorization Categorization ExperimentsExperimentsSecond Experiment
◦ Class: {PJ , Non-PJ} 31 instances, 14 PJ Transcripts, 15 Non-PJ Non-PJ obtained from Dr. Susan Gauch – collected
during her ChatTrack project PJ transcripts, both Victim and Predator were
coded
◦Same eight attributes
April KontostathisDepartment of Mathematics and Computer Science
ResultsResultsClassifier: C4.5 (J48 in Weka)3-fold cross validationSuccess Rate: 93%
◦baseline 48%Confusion matrix
Classified as Not PJ
Classified as PJ
15 0 Actually Not PJ
2 12 Actually PJ
April KontostathisDepartment of Mathematics and Computer Science
Clustering ExperimentsClustering ExperimentsAll 288 PJ TranscriptsK Means ClusteringSame eight attributes
◦column normalizedFour Clusters found
◦minimum intra-cluster variation◦multiple runs to avoid local minima
April KontostathisDepartment of Mathematics and Computer Science
Labeling the ClustersLabeling the Clusters60 Transcripts Analyzed CloselyAge Deception Data Categorized
◦ Four distinct ways that deception can be achieved when communicating with others
1. Quantity2. Quality3. Relation4. Manner
McCornack, S.A., Levine, T.R., Solowczuk, K.A., Torres, H.I., & Campbell, D.M. (1992). When the alteration of information is viewed as deception: An empirical test of information manipulation theory. Communication Monographs, 59, 17-29.
Age data captured for all 288 transcripts
April KontostathisDepartment of Mathematics and Computer Science
Age Deception StatisticsAge Deception Statistics
Number of Transcripts Percentage of Transcripts
No discussion of age 3 5%
Honest Predators 36 60%
Deceptive Predators 21 35%
April KontostathisDepartment of Mathematics and Computer Science
Type of DeceptionType of Deception
Quantity manipulation findings Honest predators average real age was 31 yrs old Deceptive predators average real age was 38 yrs old
Quality manipulation findings Average age given by deceptive predators was 27 yrs old
Relation and Manner manipulation findings Rarely used by online sexual predators
April KontostathisDepartment of Mathematics and Computer Science
Age Labeling – a bust Age Labeling – a bust
Cluster Total Honest Percent
C0 70 50 71%
C1 173 112 65%
C2 16 12 75%
C3 27 20 74%
April KontostathisDepartment of Mathematics and Computer Science
Synergistic ActivitiesSynergistic ActivitiesContent Analysis for the Web 2.0
◦ Misbehavior Detection Task Pendar, Nick (2007) "Toward Spotting the Pedophile:
Telling victim from predator in text chats " In The Proceedings of the First IEEE International Conference on Semantic Computing: 235-241. Irvine, California.◦ Study for the Termination of Online Predators (STOP)
Hughes, D., P. Rayson, J. Walkerdine, K. Lee, P. Greenwood, A. Rashid, C. MayChahal, and M. Brennan. 2008. Supporting Law Enforcement in Digital Communities through Natural Language Analysis,. In the proceedings of the 2nd International Workshop on Computational Forensics (IWCF’08). Washington D.C., USA, August 2008.◦ Isis – Protecting Children in Online Social Networks
April KontostathisDepartment of Mathematics and Computer Science
Where are we going?Where are we going?Data remains a big problem
◦ PJ data is problematic◦ Access to large chat or “chat-like”
collections is hard to getLabeling is a bigger problem
◦ Finding predatory chat is a “needle in haystack” problem
Applications are nice, but applications need to be grounded in text mining and communicative theory research.
April KontostathisDepartment of Mathematics and Computer Science
AcknowledgementsAcknowledgementsAmanda LeathermanLynne EdwardsKristina MooreBrian D. Davison and students at Lehigh
Univ.Ursinus College
◦ Media and Communication Studies faculty and students
◦ Mathematics and Computer Science faculty and students
Text Mining Workshop organizers and reviewers
April KontostathisDepartment of Mathematics and Computer Science
Contact InformationContact Information
April KontostathisUrsinus College
[email protected]://webpages.ursinus.edu/akontostathis610-409-3000 x2650