Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY...

Machine Learning Basics with Applications to Email Spam

Detection

UGR PROJECT - HAOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI

General background information about the process of machine

learning

The process of email detection

⦿ Motivation of this project

⦿ Pre-processing of data

⦿ Classifier Models● Evaluation of classifiers

Motivation of this project

⦿Spam email has been annoyed every personal email account●60% of January 2004 emails were spam● Fraud & Phishing

⦿Spam vs. Ham email

Our Goal

Spam Email example

Ham Email example

⦿ Motivation of this project⦿ Pre-processing of data

⦿ Classifier Models● Evaluation of classifiers

Pre-processing of data

⦿ Convert capital letters to lowercase

⦿ Remove numbers, and extra white space

⦿ Remove punctuations

⦿ Remove stop-words

⦿ Delete terms with length greater than 20.

⦿Original Email

⦿After pre-processing

⦿Extract Terms

⦿Reduce Terms●Keep word length < 20

⦿ Motivation of this project

⦿ Pre-processing of data⦿ Classifier Models● Evaluation of classifiers

Different classification methods

⦿ K Nearest Neighbor (KNN)

⦿ Naive Bayes Classifier

⦿ Logistic Regression

⦿ Decision Tree Analysis

What is K Nearest Neighbor

⦿ Use k "closet" samples (nearest neighbors) to perform classification

What is K Nearest Neighbor

Initial outcome and strategies for improvement

⦿ KNN accuracy was ~64% - very low

⦿ KNN classifier does not fit our project

⦿ Term-list is still too large

⦿ Try different method to classify and see if evaluation results are better than KNN results

⦿ Continue to reduce size of term list by removing terms that are not meaningful

Steps for improvement

⦿Remove sparsity⦿Reduced length threshold⦿Created hashtable⦿Used alternative classifier

●Naive- Bayes Classifier

⦿ Calculate Hash Key for each term in term-list. ⦿ Once collision occurs, use the separate chain

Hashtable

Naive- Bayes classifier

Secondary Results

⦿Correctness increases from 62% to 82.36%

Suggestions for further improvement

⦿Revise pre-processing⦿Apply additional classifiers

Thank you

⦿Questions?

Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY...

Documents

WIRELESS SAW SENSOR DRIVER Fangzhou Chen Joe Tomanelli Rong Jin Xiaoxiao Wang

周游世界的喷水池 Xiaoxiao.09 2 17.(2)

Sana Naghipour, Saba Naghipour Mentor: Phani Chavali Advisers: Ed Richter, Prof. Arye Nehorai

PRIVACY-PRESERVING DATA PUBLISHING Paper presenter: Erik Wang Discussion leader: XiaoXiao Ma

Experiments to measure the direction of arrival (DOA ...nehorai/MURI/publications/APSIS06_Elnour.pdfElnour, Lo Monte, Hurtado, Erricolo, Nehorai–Experiments to measure the direction

Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare

Minimizing Structural Bias in Single-Molecule Super ...nehorai/paper/Mazidi_Single_Molecule_Microscopyimag_2018.pdfHesam Mazidi, Jin Lu , Arye Nehorai & Matthew D. Lew Single-molecule

EMAIL SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu

XIAOXIAO EDUCATION LIMITED · English as a second language during the important early childhood years, Xiaoxiao Education is well positioned to continue to lead the private pre-school

Configurable Embedded Systems · Configurable Embedded Systems: Using Programmable Logic to Compress Embedded System Design Cycles Steven Knapp (sknapp) Arye Ziklik ... Agenda •

Nikolic', Ortner', Nehorai', R. Djordjevic2 - ese.wustl.edunehorai/paper/04497994.pdf · RADARESTIMATIONOFBUILDINGLAYOUTSUSINGJUMP-DIFFUSION MarijaM. Nikolic', MathiasOrtner', AryeNehorai',

Retained Earnings Statement, Prior Period Adjustment ---P4-6 Xiaoxiao Xia Rebekah Titsworth Dan Brendich Kai Benjamin Melissa Goetz

Sandeep P. Sirat, Douglas Cochrant, Antonia Papandreou …nehorai/DARPA/publications/... · 2014. 9. 9. · IMPROVINGDETECTIONINSEACLUTTERUSINGWAVEFORMSCHEDULING SandeepP. Sirat,

Kali Pierson Monica Arismendez Arye Shannon-Carmichael

OVERVIEW OF BEAMFORMING Arye Nehorai · OVERVIEW OF BEAMFORMING Arye Nehorai Department of Electrical Engineering and Computer Science The University of Illinois at Chicago. Outline

Abb. 8: Lena Marike Wellmann, XiaoXiao Tai, Virsavija ...tad.architektur.tu-berlin.de/archiv/dokumentation/... · Abb. 8: Lena Marike Wellmann, XiaoXiao Tai, Virsavija Fuhrmann Abb

Shen Congwen, Xiaoxiao

Lydia Song, Lauren Steimle, Xiaoxiao Xu , and Dr. Arye Nehorai

XIAOXIAO EDUCATION LIMITED

Xiaoxiao Had Oop