Upload
muhammad-imran
View
90
Download
0
Embed Size (px)
Citation preview
AIDRTutorialMuhammadImranResearchScien1st
QatarCompu1ngResearchIns1tute,HBKUDoha,Qatar
h"p://aidr.qcri.org/
DataCollec2oninAIDR
• Twi:erdatacollec2onstrategiesthatAIDRsupports– Bykeywords– Bygeographicalregions
• Strict:coordinatesstrictlyinsidegeoboundaries• Approximate:tweetsfromaplacethatoverlapswiththegeoboundaries.
– ByfollowingTwi:erusers– Bykeywords+regions
• Tweetsthatmatchanyofthekeywordsandwithinthegeoboundaries.
DataCollec2onUsingKeywords
• Keywordslimit=400• Onekeywordcouldasinglewordlike“Suffolk”oraphrase“Suffolkaccident”
• 1keyword/phrasecannotbemorethan60bytes(1char=1byte)
• Generickeywordscollectirrelevanttweets• Specifickeywordsmostlikelycollectrelevanttweets
Loca2on-basedCollec2on
• Boundingboxesdonotactasfiltersforotherfilterparameters.Forexample:keyword=twi:er&loca2ons=-122.75,36.8,-121.75,37.8wouldmatchanytweetscontainingthetermTwi:er(evennon-geotweets)ORcomingfromtheSanFranciscoarea.
FollowingTwi:erUsers
Foreachuserspecified,thetoolwillcollect:• Tweetscreatedbytheuser.• Tweetswhichareretweetedbytheuser.• RepliestoanyTweetcreatedbytheuser.• RetweetsofanyTweetcreatedbytheuser.• Manualreplies,createdwithoutpressingareplybu:on(e.g.
“@twi:erapiIagree”).
Thetoolwillnotcontain:• Tweetsmen2oningtheuser(e.g.“Hello@twi:erapi!”).• ManualRetweetscreatedwithoutpressingaRetweetbu:on(e.g.
“RT@twi:erapiTheAPIisgreat”).• Tweetsbyprotectedusers.
Usecomma-separatedlistofTwiFeruserid(hFp://geFwiFerid.com/)
DataClassifica2oninAIDR
• Defineclassifiers(name,descrip2on)– Definelabels(name,descrip2on)– Havinga“miscellaneous”categorywillbehelpful
• Waitaround15-20minutes(forfastcollec2ons)and30-40minutes(forslowcollec2on)
• Starttagging
ClassifierGenera2on
• Checktheclassifierstatus(UI)– Firstclassifier/modelwillbeupager50labeledtweets,ideallyequallydistributedamonglabels
– Ifnomodelappearsager50tags,keeptagging• Human-taggeditems(themorethebe:er)• 40moreneededtore-train(nextclassifiertarget)• Machine-taggeditems(keepaneyeonmisclassifica2ons)
• Quality(ideallyshouldbe90<AUC!=100)