Text Analytics for Security

  • Published on
    06-Jul-2015

  • View
    560

  • Download
    6

Embed Size (px)

DESCRIPTION

Tutorial presented in the 21st ACM Conference on Computer and Communications Security (CCS 2014)

Transcript

<ul><li> 1. Tutorial:TextAnaly0csforSecurityWilliamEnckNorthCarolinaStateUniversityh7p://www.enck.orgenck@cs.ncsu.eduTaoXieUniversityofIllinoisatUrbana-Champaignh7p://web.engr.illinois.edu/~taoxie/taoxie@illinois.edu</li></ul><p> 2. WhatisComputerSecurity?Acomputerissecureifyoucandependonitanditsso9waretobehaveasyouexpect. 3. UserExpectaJons UserexpectaJonsareaformofcontext. Otherformsofcontextforsecuritydecisions Temporalcontext(e.g.,Jmeofday) Environmentalcontext(e.g.,locaJon) ExecuJoncontext OSlevel(e.g.,UID,arguments) Programanalysislevel(e.g.,controlflow,dataflow) 4. DefiningUserExpectaJons UserexpectaJonsaredifficulttoformally(andeveninformally)define. BasedonanindividualspercepJontheresultsfrompastexperiencesandeducaJon ...so,wecantbeperfect StarJngplace:lookattheuserinterface 5. WhyTextAnalyJcs? Userinterfaceconsistsofgraphicsandtext Endusers:includesfinding,installing,andrunningthesoWware(e.g.,firstrunvs.subsequent) Developers:includesAPIdocumentaJon,commentsincode,andrequirementsdocuments Goal:processnaturallanguagetextualsourcestoaidsecuritydecisions 6. Outline IntroducJon BackgroundontextanalyJcs CaseStudy1:AppMarkets CaseStudy2:ACPRules Wrap-up 7. ChallengesinAnalyzingNLData Unstructured Hardtoparse,someJmeswronggrammar Ambiguous:oWenhasnodefinedorprecisesemanJcs(asopposedtosourcecode) Hardtounderstand Manywaystorepresentsimilarconcepts HardtoextractinformaJonfrom/* We need to acquire the write IRQ lock before calling ep_unlink(). *//* Lock must be acquired on entry to this function. *//* Caller must hold instance lock! */ 8. WhyAnalyzingNLDataisEasy(?) Redundantdata Easytogetgoodresultsforsimpletasks Simplealgorithmswithoutmuchtuningeffort EvoluJon/versionhistoryreadilyavailable ManytechniquestoborrowfromtextanalyJcs:NLP,MachineLearning(ML),InformaJonRetrieval(IR),etc. 9. TextAnaly9csKnowledge Rep. &amp; Search &amp; DBReasoning / TaggingData AnalysisComputationalLinguisticsM.Grobelnik,D.Mladenic 10. WhyAnalyzingNLDataisHard(?) Domainspecificwords/phrases,andmeanings CallafuncJonvs.callafriend Computermemoryvs.humanmemory Thismethodalsoreturnsfalseifpathisnull Poorqualityoftext Inconsistent grammarmistakes trueifpathisanabsolutepath;otherwisefalsefortheFileclassin.NETframework IncompleteinformaJon 11. SomeMajorNLP/TextAnaly9csToolsTextMinerStanfordParserh7p://uima.apache.org/h7p://nlp.stanford.edu/soWware/lex-parser.shtmlh7p://nlp.stanford.edu/links/statnlp.htmlh7p://www.kdnuggets.com/soWware/text.htmlTextAnalyJcsforSurveys 12. DimensionsinTextAnaly9cs ThreemajordimensionsoftextanalyJcs: RepresentaJons fromwordstoparJal/fullparsing Techniques frommanualworktolearning Tasks fromsearch,over(un-)supervisedlearning,summarizaJon,M.Grobelnik,D.Mladenic 13. MajorTextRepresenta9ons Words(stopwords,stemming) Part-of-speechtags Chunkparsing(chunking) SemanJcrolelabeling VectorspacemodelM.Grobelnik,D.Mladenic 14. WordsProper9es RelaJonsamongwordsurfaceformsandtheirsenses: Homonymy:sameform,butdifferentmeaning(e.g.bank:riverbank,financialinsJtuJon) Polysemy:sameform,relatedmeaning(e.g.bank:bloodbank,financialinsJtuJon) Synonymy:differentform,samemeaning(e.g.singer,vocalist) Hyponymy:oneworddenotesasubclassofananother(e.g.breakfast,meal) Generalthesaurus:WordNet,exisJnginmanyotherlanguages(e.g.EuroWordNet) h7p://wordnet.princeton.edu/ h7p://www.illc.uva.nl/EuroWordNet/M.Grobelnik,D.Mladenic 15. StopWords Stopwordsarewordsthatfromnon-linguisJcviewdonotcarryinformaJon theyhavemainlyfuncJonalrole usuallyweremovethemtohelpminingtechniquestoperformbe7er Stopwordsarelanguagedependentexamples: English:A,ABOUT,ABOVE,ACROSS,AFTER,AGAIN,AGAINST,ALL,ALMOST,ALONE,ALONG,ALREADY,...M.Grobelnik,D.Mladenic 16. Stemming DifferentformsofthesamewordareusuallyproblemaJcfortextanalysis,becausetheyhavedifferentspellingandsimilarmeaning(e.g.learns,learned,learning,) Stemmingisaprocessoftransformingawordintoitsstem(normalizedform) stemmingprovidesaninexpensivemechanismtomergeM.Grobelnik,D.Mladenic 17. Stemmingcont. ForEnglishismostlyusedPorterstemmerath7p://www.tartarus.org/~marJn/PorterStemmer/ ExamplecascaderulesusedinEnglishPorterstemmer ATIONAL-&gt;ATErelaJonal-&gt;relate TIONAL-&gt;TIONcondiJonal-&gt;condiJon ENCI-&gt;ENCEvalenci-&gt;valence ANCI-&gt;ANCEhesitanci-&gt;hesitance IZER-&gt;IZEdigiJzer-&gt;digiJze ABLI-&gt;ABLEconformabli-&gt;conformable ALLI-&gt;ALradicalli-&gt;radical ENTLI-&gt;ENTdifferentli-&gt;different ELI-&gt;Evileli-&gt;vile OUSLI-&gt;OUSanalogousli-&gt;analogousM.Grobelnik,D.Mladenic 18. Part-of-SpeechTags Part-of-speechtagsspecifywordtypesenablingtodifferenJatewordsfuncJons Fortextanalysis,part-of-speechtagisusedmainlyforinformaJonextracJonwhereweareinterestedine.g.,namedenJJes(nounphrases) AnotherpossibleuseisreducJonofthevocabulary(features) itisknownthatnounscarrymostoftheinformaJonintextdocuments Part-of-SpeechtaggersareusuallylearnedonmanuallytaggeddataM.Grobelnik,D.Mladenic 19. Part-of-SpeechTableh7p://www.englishclub.com/grammar/parts-of-speech_1.htmh7p://www.clips.ua.ac.be/pages/mbsp-tagsM.Grobelnik,D.Mladenic 20. Part-of-SpeechExamplesh7p://www.englishclub.com/grammar/parts-of-speech_2.htmM.Grobelnik,D.Mladenic 21. PartofSpeechTagsh7p://www2.sis.pi7.edu/~is2420/class-notes/2.pdf 22. FullParsing ParsingprovidesmaximumstructuralinformaJonpersentence Input:asentenceoutput:aparsetree Formosttextanalysistechniques,theinformaJoninparsetreesistoocomplex Problemswithfullparsing: Lowaccuracy Slow DomainSpecificM.Grobelnik,D.Mladenic 23. ChunkParsing Breaktextupintonon-overlappingconJguoussubsetsoftokens. aka.parJal/shallowparsing,lightparsing. Whatisitusefulfor? EnJtyrecogniJonpeople,locaJons,organizaJons StudyinglinguisJcpa7ernsgaveNPgaveupNPinNPgaveNPNPgaveNPtoNP CanignorecomplexstructurewhennotrelevantM. Hearst 24. ChunkParsingGoal:divideasentenceintoasequenceofchunks. Chunksarenon-overlappingregionsofatext[I]saw[atallman]in[thepark] Chunksarenon-recursive Achunkcannotcontainotherchunks Chunksarenon-exhausJve NotallwordsareincludedinthechunksS. Bird 25. ChunkParsingTechniques Chunkparsersusuallyignorelexicalcontent Onlyneedtolookatpart-of-speechtags TechniquesforimplemenJngchunkparsing E.g.,RegularexpressionmatchingS. Bird 26. RegularExpressionMatching Definearegularexpressionthatmatchesthesequencesoftagsinachunk Asimplenounphrasechunkregrexp:?* Chunkallmatchingsubsequences:The/DTli7le/JJcat/NNsat/VBDon/INthe/DTmat/NN[The/DTli7le/JJcat/NN]sat/VBDon/IN[the/DTmat/NN] Ifmatchingsubsequencesoverlap,thefirstonegetspriorityS. BirdDT:DeterminnerJJ:AdjecJveNN:Noun,sing,ormassVBD:Verb,pasttenseIN:PreposJon/sub-conjVerb 27. Seman9cRoleLabelingGivingSeman0cLabelstoPhrases [AGENTJohn]broke[THEMEthewindow] [THEMEThewindow]broke [AGENTSothebys]..offered[RECIPIENTtheDorranceheirs][THEMEamoney-backguarantee] [AGENTSothebys]offered[THEMEamoney-backguarantee]to[RECIPIENTtheDorranceheirs] [THEMEamoney-backguarantee]offeredby[AGENTSothebys] [RECIPIENTtheDorranceheirs]will[ARM-NEGnot]beoffered[THEMEamoney-backguarantee]S.W. Yih&amp;K. Toutanova 28. Seman9cRoleLabelingGoodforQues0onAnsweringQ:WhatwasthenameofthefirstcomputersystemthatdefeatedKasparov?A:[PATIENTKasparov]wasdefeatedby[AGENTDeepBlue][TIMEin1997].Q:WhenwasNapoleondefeated?Lookfor:[PATIENTNapoleon][PREDdefeat-synset][ARGM-TMP*ANS*]Moregenerally:S.W. Yih&amp;K. Toutanova 29. TypicalSeman9cRolesS.W. Yih&amp;K. Toutanova 30. ExampleSeman9cRolesS.W. Yih&amp;K. Toutanova 31. Outline IntroducJon BackgroundontextanalyJcs CaseStudy1:AppMarkets CaseStudy2:ACPRules Wrap-up 32. CaseStudy:AppMarkets AppMarketshaveplayedanimportantroleinthepopularityofmobiledevices ProvideuserswithatextualdescripJonofeachapplicaJonsfuncJonalityAppleAppStoreGooglePlayMicrosoWWindowsPhone 33. CurrentPracJce Apple:marketsresponsibility AppleperformsmanualinspecJon Google:usersresponsibility Usersapprovepermissionsforsecurity/privacy Bouncer(staJc/dynamicmalwareanalysis) WindowsPhone:hybrid Permissions/manualinspecJon 34. IsProgramAnalysisSufficient? Previousapproacheslookatpermissions,code,andrunJmebehaviors Caveat:whatdoestheuserexpect? GPSTracker:recordandsendlocaJon Phone-callRecorder:recordaudioduringcall One-ClickRoot:exploitvulnerability Othersaremoresubtle 35. Vision Goal:bridgegapbetweenuserexpecta0onandappbehavior WHYPERisafirststepinthisdirecJon FocusonpermissionandappdescripJons Limitedtopermissionsthatprotectuserunderstandableresources 36. UseCases Enhanceuserexperiencewhileinstallingapps FuncJonalitydisclosuretoduringapplicaJonsubmissiontomarket ComplemenJngprogramanalysistoensuremoreappropriatejusJficaJonsApplicationMarketWHYPERDEVELOPERSUSERS 37. Strawman:KeywordSearch Confoundingeffects: Certainkeywordssuchascontacthaveaconfoundingmeaning,e.g.,...displaysusercontacts,... vs...contactmeatabc@xyz.com SemanJcInterference: SentencesoWendescribeasensiJveoperaJonsuchasreadingcontactswithoutactuallyreferringtothekeywordcontact,e.g.,shareyogaexerciseswithyourfriendsviaemail,sms 38. WHYPERFrameworkAPPDescripJonAPPPermissionPreprocessorSemanJcGraphsNLPParserIntermediateRepresentaJonGeneratorFOLRepresentaJonSemanJcEngineSemanJcGraphAPIDocsGeneratorAnnotatedDescripJonWHYPER 39. Preprocessor PeriodHandling Decimals,ellipsis,shorthandnotaJons(Mr.,Dr.) SentenceBoundaries Tabs,bulletpoints,delimiters(:) Symbols(*,-)andenumeraJonsentence NamedEnJtyHandling E.g.,Pandorainternetradio AbbreviaJonHandling E.g.,InstantMessage(IM) 40. IntermediateRepresentaJonGeneratorAlso you can share yoga exercise to your friends via Email and SMSRB PRP MD VB DT NN NN PRP NNS NNP NNPAlsoyoucanshareexercisefriendsyourEmailSMStheyogaadvmodnsubjauxdobjdetnnprep_topossprep_viaconj_andthesharetoyouyoga exerciseownedyouviafriendsandemailSMSRB:adverb;PRP:pronoun;MD:verb,modalauxillary;VB:verb,baseform;DT:determiner;NN:noun,singularormass;NNS:noun,plural;NNP:noun,propersingularh7p://www.clips.ua.ac.be/pages/mbsp-tags 41. SemanJc-GraphGenerator 42. SemanJc-GraphGenerator SystemaJcapproachtoinfergraphs FindrelatedAPIdocumentsusingPscout[CCS12] IdenJfyresourceassociatedwithpermissionsfromtheAPIclassname ContactsContract.Contacts InspectthemembervariablesandmembermethodstoidenJfyacJonsandsubordinateresources ContactsContract.CommonDataKinds.Email 43. SemanJcEngineAlsoyoucansharetheyogaexercisetoyourfriendsviaEmailandSMS.sharetoyouyoga exerciseownedyouviafriendsandemailSMSWordNet Similarity 44. EvaluaJon Subjects Permissions:READ_CONTACTS,READ_CALENDAR,RECORD_AUDIO 581/600*applicaJondescripJons(Englishonly) 9,953sentences ResearchQuesJons RQ1:Whataretheprecision,recall,andF-ScoreofWHYPERinidenJfyingpermissionsentences? RQ2:HoweffecJveisWHYPERinidenJfyingpermissionsentences,comparedtokeyword-basedsearching 45. SubjectStaJsJcsPermissions#N#SSpREAD_CONTACTS1903,379235READ_CALENDAR1912,752283RECORD_AUDIO2003,822245TOTAL5819,953763 46. RQ1Results:EffecJveness Outof9,061sentences,only129flaggedasFPs Among581apps,109apps(18.8%)containatleastoneFP Among581apps,86apps(14.8%)containatleastoneFNPermissionSITPFPFNTNPrec.RecallF-ScoreAccREAD_CONTACTS20418618492,93091.279.284.897.9READ_CALENDAR28824147422,42283.785.284.596.8RECORD_AUDIO25919564503,47075.379.677.497.0TOTAL7516221291419,06182.881.582.297.3 47. R2Results:ComparisontoKeyword-basedsearchPermissionKeywordsREAD_CONTACTSREAD_CALENDARRECORD_AUDIOrecord, audio, voice, capture, microphonePermissionDeltacalendar, event, date, month, day, yearPrecisioncontact, data, number, name, emailDeltaRecallDeltaF-scoreDeltaAccuracyREAD_CONTACTS50.41.331.27.3READ_CALENDAR39.31.526.49.2RECORD_AUDIO36.9-6.624.36.8WHYPER Improvement41.6-1.227.27.7 48. ResultsAnalysis:FalsePosiJves IncorrectParsing MyLinkAdvancedprovidesfullsynchronizaJonofallMicrosoWOutlookemails(inbox,sent,outboxanddraWs),contacts,calendar,tasksandnoteswithallAndroidphonesviaUSB SynonymAnalysis Youcannowturnrecordingsintoringtones. 49. ResultsAnalysis:FalseNegaJves Incorrectparsing IncorrectidenJficaJonofsentenceboundariesandlimitaJonsofunderlyingNLPinfrastructure LimitaJonsofSemanJcGraphs ManualAugmentaJon Microphone(blowinto)andcall(record) Significantimprovementofdeltarecalls:-6.6%to0.6% Future:automaJcminingfromusercommentsandforums 50. BroaderApplicability GeneralizaJontootherpermissions User-understandablepermissions:calls,SMS Problemareas LocaJonandphoneidenJfiers(widelyabused) Internet(nearlyeveryapprequires) 51. DatasetandPaper Ourcodeanddatasetsareavailableath7ps://sites.google.com/site/whypermission/ RahulPandita,XushengXiao,WeiYang,WilliamEnck,andTaoXie.WHYPER:TowardsAutoma9ngRiskAssessmentofMobileApplica9ons.InProc.22ndUSENIXSecuritySymposium(USENIXSecurity2013)hQp://www.enck.org/pubs/pandita-sec13.pdf 52. Outline IntroducJon BackgroundontextanalyJcs CaseStudy1:AppMarkets CaseStudy2:ACPRules Wrap-up 53. AccessControlPolicies(ACP) AccesscontrolisoWengovernedbysecuritypoliciescalledAccessControlPolicies(ACP) IncludesrulestocontrolwhichprincipalshaveaccesstowhichresourcesTheHealthCarePersonnel(HCP)doesnothavetheabilitytoeditthepatient'saccount. Apolicyruleincludesfourelements SubjectHCP AcJonedit Resource-paJent'saccount Effect-denyex. 54. AccessControlVulnerabiliJes2010Report1. Cross-sitescripJng2. SQLinjecJon3. Classicbufferoverflow4. Cross-siterequestforgery5. Improperaccesscontrol(Authoriza9on)6. ...54Improperaccesscontrolcausesproblems(e.g.,informaJonexposures) IncorrectspecificaJon Incorrectenforcement 55. ProblemsofACPPracJce InpracJce,ACPs Buriedinrequirementdocuments Wri7eninNLandnotcheckable NLdocumentscouldbelargeinsize ManualextracJonislabor-intensiveandtedious 56. OverviewofText2PolicyAHCPshouldnotchangepaJentsaccount.An[subject:HCP]shouldnot[ac0on:change][resource:paJentsaccount].ACPRuleSubjectAcJonResourceEffectHCPUPDATE-changepaJentsaccountdenyLinguisJcAnalysisModel-InstanceConstrucJonTransformaJon 57. LinguisJcAnalysis IncorporatesyntacJcandsemanJcanalysis syntac9cstructure-&gt;noungroup,verbgroup,etc. seman9cmeaning-&gt;subject,acJon,resource,negaJvemeaning,etc. ProvideNewtechniquesformodelextracJon IdenJfyACPsentences InfersemanJcmeaning 58. CommonTechniques Shallowparsing DomaindicJonary AnaphoraresoluJonNPVGPNPAnHCPcanviewpatientsaccount.Heisdisallowedtochangethepatientsaccount.HCPUPDATESubjectMainVerbGroupObjectNP:nounphraseVG:verbchunkh7p://www.clips.ua.ac.be/pages/mbsp-tagsPNP:preposiJonalnounphrase 59. TechnicalChallenges(TC)inACPExtracJonACP1:AnHCPcannotchangepaJentsaccount.ACP2:AnHCPisdisallowedtochangepaJentsaccount. TC1:SemanJcStructureVariance differentwaystospecifythesamerule TC2:NegaJveMeaningImplicitness verbcouldhavenegaJvemeaning 60. SemanJc-Pa7ernMatching AddressTC1SemanJcStructureVariance Composepa7ernbasedongrammaJcalfuncJonAnHCPisdisallowedtochangeex.thepaJentsaccount.passivevoicefollowedbyto-infinitivephrase 61. NegaJve-ExpressionIdenJficaJon AddressTC2NegaJveMeaningImplicitness NegaJveexpression notinsubject: notNoHCPcaninverbgroup: NegaJvemeaningwordsinmainverbgroupex.editpaJentsaccount.HCPcannoteditpaJentsaccount.HCPcannevereditpaJentsaccount.ex.ex.AnHCPisdisallowedtochangethepaJentsaccount. 62. OverviewofText2PolicyAHCPshouldnotchangepaJentsaccount.An[subject:HCP]shouldnot[ac0on:change][resource:paJentsaccount].ACPRuleSubjectAcJonR...</p>

Recommended

View more >