Upload
david-i-widjaja
View
81
Download
1
Embed Size (px)
Citation preview
Project IDIDavid I Widjaja
Steps
Data Extraction Tagging Correlation Web Scraping Comparison Documentation
Data Extraction
How to get the data? Input from database Input manually
Data type: Topics that is made of strings
Tagging Prerequisite:
Topic Sentences (Subject) Dictionary (Tags)
Dictionary How to create tags:
1. Get all topic sentences and split them between white space2. Convert all words into lower case 3. Delete all numeric and duplicate values 4. Sort words alphabetically 5. Delete unnecessary words (e.g. is, the, and, etc.)6. Search for synonym words and cluster them into a single tag7. Translate words if necessary8. Insert tags into main spreadsheet
Correlation
A weighted graph map is used: The larger the amount of word
associated with the tag, the bigger the bubble.
Lines get thicker according to the number of relationship between topics.
Web Scraping Web Scraping on other similar
websites Take the topic sentences to be in the
subject columns. Examples: Article Titles Comments Etc.
Copy to previous spreadsheet (The one with the pervious tags).
Correlation
Do the same process as before on the weighted graph map
Comparison Compare the two weighted graph maps
Word Cloud Generate Word Cloud using Python or online tools.
e.g.
Tools
Microsoft Excel 2013 (Spreadsheet)
Mozilla Firefox (Browser) Inspect Element (Search Patterns) DownThemAll (Download HTMLs)
Total Commander (Merge HTMLs) Notepad++ (Cleanse Data)