55
Text Analytics for Mobile App Security and Beyond Tao Xie University of Illinois at Urbana-Champaign 1 taoxie@illino is.edu

Text Analytics for Mobile App Security and Beyond

  • Upload
    boris

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Text Analytics for Mobile App Security and Beyond. Tao Xie University of Illinois at Urbana-Champaign. [email protected]. Mobile App Markets. Google Play. Microsoft Windows Phone. Apple App Store. App Store beyond Mobile Apps!. What If Formal Specs Are Written?!. - PowerPoint PPT Presentation

Citation preview

Page 1: Text Analytics for Mobile App Security and Beyond

1

Text Analytics for Mobile App Security and Beyond

Tao XieUniversity of Illinois at Urbana-Champaign

[email protected]

Page 2: Text Analytics for Mobile App Security and Beyond

Mobile App Markets

Apple App Store Google Play Microsoft Windows Phone

Page 3: Text Analytics for Mobile App Security and Beyond

App Store beyond Mobile Apps!

Page 4: Text Analytics for Mobile App Security and Beyond

What If Formal Specs Are Written?!

4

APP DEVELOPERS

APP USERS

App Functional Requirements

App Security Requirements

User Functional Requirements

User Security Requirements

informal: app description, etc. permission list, etc.

Page 5: Text Analytics for Mobile App Security and Beyond

Informal App Functional Requirements: App Description

5

App Code

App Permissions

Page 6: Text Analytics for Mobile App Security and Beyond

App Security Requirements: Permission List

6

Page 7: Text Analytics for Mobile App Security and Beyond

What If Formal Specs Are Written?!

7

APP DEVELOPERS

APP USERS

App Functional Requirements

App Security Requirements

User Functional Requirements

User Security Requirements

informal: app description, etc. permission list, etc.

Page 8: Text Analytics for Mobile App Security and Beyond

Example Andriod App: Angry Birds

8

Page 9: Text Analytics for Mobile App Security and Beyond

What If Formal Specs Are Written?!

9

APP DEVELOPERS

APP USERS

App Functional Requirements

App Security Requirements

User Functional Requirements

User Security Requirements

In reality, few of these requirements are (formally) specified!! Hope?!: Bring human into the loop: user perception + judgment

informal: app description, etc. permission list, etc.

Page 10: Text Analytics for Mobile App Security and Beyond

Our Yin-Yang View on Mobile App Security

10

App Description

App Code

App Permissions

User-Perceived Information

App Security Behavior

o Reason about user-perceived info, e.g., WHYPER ( )

o Push app security behavior across the boundary ()

o Check consistency across the boundary ()

o Reduce user judgment effort ( )

App UIs, App categories, App metadata, User forums, …

[functional]

[security]

Page 11: Text Analytics for Mobile App Security and Beyond

11

oApple (Market’s Responsibility)o Apple performs manual inspection

oGoogle (User’s Responsibility)o Users approve permissions for security/privacyo Bouncer (static/dynamic malware analysis)

oWindows Phone (Hybrid)o Permissions / manual inspection

Assuring Market Security/Privacy

Page 12: Text Analytics for Mobile App Security and Beyond

12

o Previous approaches look at permissions code (runtime behaviors)

o What does the users expect?o GPS Tracker: record and send locationo Phone-Call Recorder: record audio during phone call

Need More Than Program Analysis

App Description

App Code

App Permissions

Page 13: Text Analytics for Mobile App Security and Beyond

13

oUser expectationso user perception + user judgment

o Focus on permission app descriptionso permissions (protecting user understandable

resources) should be discussed

Vision“Bridging the gap between

user expectation app behaviors”

App Description Sentence Permission

Linkage

Page 14: Text Analytics for Mobile App Security and Beyond

14

WHYPER Overview

Application Market

WHYPER

DEVELOPERS

USERSPandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security 2013http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf

• Enhance user experience while installing apps• Enforce functionality disclosure on developers• Complement program analysis to ensure justifications

Page 15: Text Analytics for Mobile App Security and Beyond

15

Example Sentence in App Desc.• E.g., “Also you can share the yoga

exercise to your friends via Email and SMS.” – Implication of using the contact

permission– Permission sentences

Keyword-based search on application descriptions

Page 16: Text Analytics for Mobile App Security and Beyond

16

Problems with Ctrl + F

• Confounding effects:– Certain keywords such as “contact” have a

confounding meaning – E.g., “... displays user contacts, ...” vs “... contact

me at [email protected]”.

• Semantic inference: – Sentences often describe a sensitive operation

without actually referring to keywords – E.g., “share yoga exercises with your friends via

Email and SMS”

Page 17: Text Analytics for Mobile App Security and Beyond

Natural Language Processing

• Natural Language Processing (NLP) techniques help computers understand NL artifacts

• In general, NLP is still difficult

• NLP on domain specific sentences with specific styles is feasible– Text2Policy: extraction of security policies from use cases [FSE 12]– APIInfer: inferring contracts from API docs [ICSE 12]– WHYPER: domain knowledge from API docs [USENIX Security 13]

Page 18: Text Analytics for Mobile App Security and Beyond

18

Overview of WHYPER

APP Description

APP Permission

SemanticGraphs

PreprocessorIntermediate

RepresentationGenerator

SemanticEngine

NLP Parser

Semantic GraphGeneratorAPI Docs

AnnotatedDescription

FOLRepresentation

WHYPER

Domain Knowledge

Page 19: Text Analytics for Mobile App Security and Beyond

19

Preprocessor• Period Handling

– Decimals, ellipsis, shorthand notations (Mr., Dr.)

• Sentence Boundaries– Tabs, bullet points, delimiters (:)– Symbols (*,-) and enumeration sentence

• Named Entity Handling– E.g., “Pandora internet radio”

• Abbreviation Handling– E.g., “Instant Message (IM)”

Page 20: Text Analytics for Mobile App Security and Beyond

20

Intermediate-Representation Generator

Alsoyoucanshare yogaexercisetoyourfriendsviaEmailandSMSVBRB PRP MD NNDT NN NNSPRP NNP NNP

the

Alsoyoucan

share

exercise

yourfriendsEmail

SMS

yoga

advmodnsubjauxdobj

detnn

prep_topossprep_via

conj_and

the

shareto

youyoga exercise

ownedyouvia

friendsand

EmailSMS

Predicate

Governing

Entity

DependentEntit

y

Page 21: Text Analytics for Mobile App Security and Beyond

Semantic Engine

shareto

youyoga exercise

ownedyouvia

friendsand

EmailSMSEmail

share

WordNet Similarity

21

Inferred from API

DocsGoverning

Entity

DependentEntit

y

Page 22: Text Analytics for Mobile App Security and Beyond

22

Systematic approach to infer graphso Identify resource associated with the permissions

from the API class nameo ContactsContract.Contacts

o Inspect the member variables and member methods to identify actions and subordinate resourceso ContactsContract.CommonDataKinds.Email

Semantic-Graph Generator

Page 23: Text Analytics for Mobile App Security and Beyond

23

Evaluation• Subjects

– Permissions: • READ_CONTACTS • READ_CALENDAR • RECORD_AUDIO

– 581 application descriptions – 9,953 sentences

• Evaluation setup– Manual annotation of the sentences– WHYPER for identifying permission sentences– Comparison to keyword-based searching

Page 24: Text Analytics for Mobile App Security and Beyond

24

Evaluation Results

• Precision and recall of WHYPER – Average precision (82.8%) and recall (81.5%)

• Comparison to keyword-based searching – Improving precision (41.6%) and recall (-1.2%)– E.g., microphone-blow into and call-record

Permission KeywordsREAD_CONTACTS contact, data, number,

name, emailREAD_CALENDAR calendar, event, date,

month, day, yearRECORD_AUDIO record, audio, voice,

capture, microphone

Page 25: Text Analytics for Mobile App Security and Beyond

Access Control Policies (ACP) in Requirements Document

• Access control is often governed by security policies called Access Control Policies (ACP)– Includes rules to control which principals have access to

which resources

• A policy rule includes four elements– Subject – HCP – Action – edit– Resource - patient's account– Effect - deny

“The Health Care Personnel (HCP) does not have the ability to edit the patient's account.”

ex.

Page 26: Text Analytics for Mobile App Security and Beyond

Overview of Text2Policy

A HCP should not change patient’s account.

An [subject: HCP] should not [action: change] [resource: patient’s account].

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Linguistic Analysis

Model-Instance Construction

TransformationXiao et al. Automated Extraction of Security Policies from Natural-Language Software Documents. FSE 2012. http://web.engr.illinois.edu/~taoxie/publications/fse12-nlp.pdf

Page 27: Text Analytics for Mobile App Security and Beyond

Example Technical Challenges in ACP Extraction

• Semantic Structure Variance– different ways to specify the same rule

• Negative Meaning Implicitness– verb could have negative meaning

ACP 1: An HCP cannot change patient’s account.ACP2: An HCP is disallowed to change patient’s account.

Page 28: Text Analytics for Mobile App Security and Beyond

Road Ahead: Yin-Yang View

28

App Description

App Code

App Permissions

User-Perceived Information

App Security Behavior

o Reason about user-perceived info, e.g., WHYPER ( )

o Push app security behavior across the boundary ()

o Check consistency across the boundary ()

o Reduce user judgment effort ( )

App UIs, App categories, App metadata, User forums, …

[functional]

[security]

Page 29: Text Analytics for Mobile App Security and Beyond

Text Analytics for Mobile App Security and Beyond

29

App Description

App Code

App Permissions

[email protected]

App UIs, App categories, App metadata, User forums, …

Acknowledgments: Supported in part by NSA Science of Security (SoS) Lablet, NSF SaTC, NSF SHF, NSF CAREER

Page 30: Text Analytics for Mobile App Security and Beyond

30

Page 31: Text Analytics for Mobile App Security and Beyond

31

Problems with Ctrl + F

o Confounding effects:

o Certain keywords such as “contact” have a confounding meaning. o For instance, “... displays user contacts, ...” vs “... contact me at [email protected]”.

o Semantic Inference:

o Sentences often describe a sensitive operation such as reading contacts without actually referring to keyword “contact”.

o For instance, “share yoga exercises with your friends via email, sms”.

Page 32: Text Analytics for Mobile App Security and Beyond

32

• NLP techniques help computers understand NL artifacts

• NLP is still difficult

• NLP on domain specific sentences with specific styles is feasible

Natural Language Processing (NLP)

Page 33: Text Analytics for Mobile App Security and Beyond

33

RQ1 Results: Effectiveness of WHYPER

• Low FPs and FNs• out of 9,061 sentences, only 129 are flagged as FPs• among 581 applications, 109 applications (18.8%) contain at least one FP• among 581 applications, 86 applications (14.8%) contain at least one FN

Permission SI TP FP FN TN Prec. Recall F-Score Acc

READ_CONTACTS 204 186 18 49 2,930 91.2 79.2 84.8 97.9

READ_CALENDAR 288 241 47 42 2,422 83.7 85.2 84.5 96.8

RECORD_AUDIO 259 195 64 50 3,470 75.3 79.6 77.4 97.0

TOTAL 751 622 129 141 9,061 82.8 81.5 82.2 97.3

Page 34: Text Analytics for Mobile App Security and Beyond

34

• Incorrect parsing• “MyLink Advanced provides full

synchronization of all Microsoft Outlook emails (inbox, sent, outbox and drafts), contacts, calendar, tasks and notes with all Android phones via USB”

• Synonym analysis• “You can now turn recordings into

ringtones.”

Result Analysis (False Positives)

Page 35: Text Analytics for Mobile App Security and Beyond

35

• Incorrect parsing• Incorrect identification of sentence boundaries and limitations of

underlying NLP infrastructure

• Limitations of Semantic Graphs• Manual Augmentation

• microphone-blow into and call-record• significant improvement of Delta Recalls: -6.6% to 0.6%

• Automatic mining from user comments and forums

Result Analysis (False Negatives)

Page 36: Text Analytics for Mobile App Security and Beyond

Overview of Text2Policy

A HCP should not change patient’s account.

An [subject: HCP] should not [action: change] [resource: patient’s account].

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Linguistic Analysis

Model-Instance Construction

Transformation

Page 37: Text Analytics for Mobile App Security and Beyond

Linguistic Analysis

• Incorporate syntactic and semantic analysis– syntactic structure -> noun group, verb group, etc.– semantic meaning -> subject, action, resource, negative

meaning, etc.

• Provide New techniques for model extraction– Identify ACP and AS sentences– Infer semantic meaning

Page 38: Text Analytics for Mobile App Security and Beyond

Common Techniques

• Shallow parsing• Domain dictionary• Anaphora resolution

An HCP can view patient’s account.He is disallowed to change the patient’s account.

Subject Main Verb Group

Object

NP PNP

UPDATEHCP

VG

Page 39: Text Analytics for Mobile App Security and Beyond

Technical Challenges (TC) in ACP Extraction

• TC1: Semantic Structure Variance– different ways to specify the same rule

• TC2: Negative Meaning Implicitness– verb could have negative meaning

ACP 1: An HCP cannot change patient’s account.ACP2: An HCP is disallowed to change patient’s account.

Page 40: Text Analytics for Mobile App Security and Beyond

Semantic-Pattern Matching

• Address TC1 Semantic Structure Variance

• Compose pattern based on grammatical function

An HCP is disallowed to change the patient’s account.ex.

passive voice to-infinitive phrasefollowed by

Page 41: Text Analytics for Mobile App Security and Beyond

Negative-Expression Identification

• Address TC2 Negative Meaning Implicitness

• Negative expression– “not” in subject:

– “not” in verb group:

• Negative meaning words in main verb group

No HCP can edit patient’s account.ex.

HCP can not edit patient’s account.HCP can never edit patient’s account.

ex.

ex. An HCP is disallowed to change the patient’s account.

Page 42: Text Analytics for Mobile App Security and Beyond

AS: Syntactic-Pattern Matching

• Syntactic elements– Subject , Main verb, Object

• Subject and Object Checking– subject is a not a user or object is not a resource

• Filtering negative-meaning sentences– Negative sentences tend not to describe ASs

The prescription list should include medication, the name of the doctor. . .

ex.

Page 43: Text Analytics for Mobile App Security and Beyond

Overview of Text2Policy

A HCP should not change patient’s account.

An [subject: HCP] should not [action: change] [resource: patient’s account].

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Linguistic Analysis

Model-Instance Construction

Transformation

Page 44: Text Analytics for Mobile App Security and Beyond

ACP Model-Instance Construction

• Identify subject, action, and resource:– Subject: HCP– Action: change– Resource: patient’s account

• Infer effect:– Negative Expression: none– Negative Verb: disallow– Inferred Effect: deny

An HCP is disallowed to change the patient’s account.

ex.

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Page 45: Text Analytics for Mobile App Security and Beyond

AS Model-Instance Construction

• Use case patterns– industry use cases [DSN’09]– public use cases

• Model-Instance ConstructionThe patient views access log.ex.

Action Step

Actor Action Resource

patient OUTPUT – view

access log

Page 46: Text Analytics for Mobile App Security and Beyond

Technical Challenges in Action-Step Extraction

• TC4: Transitive Subject

• TC5: Perspective Variance

AS 1:He edits the account.AS 2: The system updates the account.AS 3: The system displays the updated account.

HCPHCP views the updated account.

Page 47: Text Analytics for Mobile App Security and Beyond

Subject Flow Tracking

• Address TC4 Transitive Subject• Apply data flow to track non-system subject:

AS 1: The HCP edits the account.AS 2: The system updates the account.

Tracking Only system as subject

replaced with HCP as subject

Page 48: Text Analytics for Mobile App Security and Beyond

Perspective Conversion

• Address TC5 Perspective Variance• Apply data flow to track non-system subject:

AS 1: The HCP edits the account.AS 2: The system shows the updated account.

Tracking Only system as subject andaction is output

Converting to “HCP views the updated account”

Page 49: Text Analytics for Mobile App Security and Beyond

Evaluation – RQs

• RQ1: How effectively does Text2Policy identify ACP sentences in NL documents?

• RQ2: How effectively does Text2Policy extract ACP rules from ACP sentences?

• RQ3: How effectively does Text2Policy extract action steps from action-step sentences?

Page 50: Text Analytics for Mobile App Security and Beyond

Evaluation – Subject

• iTrust open source project– http://agile.csc.ncsu.edu/iTrust/wiki/– 448 use-case sentences (37 use cases)– preprocessed use cases

• Collected ACP sentences– 100 ACP sentences – From 17 sources (published papers and websites)

• A module of an IBMApp (financial domain)– 25 use cases

Page 51: Text Analytics for Mobile App Security and Beyond

RQ1 ACP Sentence Identification

• Apply Text2Policy to identify ACP sentences in iTrust use cases and IBMApp use cases

• Text2Policy effectively identifies ACP sentences with precision and recall more than 88%

• Precision on IBMApp use cases is better– proprietary use cases are often of higher quality compared to open-source

use cases

Page 52: Text Analytics for Mobile App Security and Beyond

Evaluation –RQ2 Accuracy of Policy Extraction

• Apply Text2Policy to extract ACP rules from ACP sentences

• Text2Policy effectively extracts ACP model instances with accuracy above 86%

Page 53: Text Analytics for Mobile App Security and Beyond

Evaluation –RQ3 Accuracy of Action-Step Extraction

• Apply Text2Policy to extract action steps from iTrust and IBMApp use cases

• Text2Policy effectively extracts AS model instances with accuracy above 81%

• Limitations: – Subordinate conjunction or else and long phrases

Page 54: Text Analytics for Mobile App Security and Beyond

Detected Inconsistencies

• No violation between ASs against the extracted ACPs

• Inconsistent names used for referring to the same entity (e.g., user) across different use cases

editor used in UC 4 of iTrust use cases actually refers to HCP, admin, and all usersin UCs 1, 2, and 4

ex.

Page 55: Text Analytics for Mobile App Security and Beyond

Summary

• Natural Language Processing (NLP) for domain-specific purposes is feasible– Challenging for general documents– Feasible for domain-specific sentences with specific

styles

• New techniques are required – Addressing unique challenges in software engineering

http://research.csc.ncsu.edu/ase/projects/text2policy/