29
Revised 07-12-2016 1 Ambiguity in Privacy Policies and the Impact of Regulation 45 Journal of Legal Studies (forthcoming) Joel R. Reidenberg, Jaspreet Bhatia, Travis D. Breaux, Thomas B. Norton Abstract Website privacy policies often contain ambiguous language that undermines the purpose and value of privacy notices for site users. This paper compares the impact of different regulatory models on the ambiguity of privacy policies in multiple online sectors. First, the paper develops a theory of vague and ambiguous terms. Next, the paper develops a scoring method to compare the relative vagueness of different privacy policies. Then, the theory and scoring are applied using natural language processing to rate a set of policies. The ratings are compared against two benchmarks to show whether government-mandated privacy disclosures result in notices less ambiguous than those emerging from the market. The methodology and technical tools can provide companies with mechanisms to improve drafting, enable regulators to easily identify poor privacy policies and empower regulators to more effectively target enforcement actions. * Respectively, Stanley D. and Nikki Waxberg Professor of Law, Fordham University and Visiting Research Collaborator, CITP, Princeton; Ph.D. candidate, Carnegie Mellon University; Assistant Professor of Computer Science, Carnegie Mellon University; Privacy Fellow, Fordham CLIP. Work on this project was supported in part by NSF Awards #1330596 and # 1330214, the Carnegie Mellon Requirements Engineering Laboratory, and a Fordham Faculty Fellowship. The authors would like to thank Stephen Broomell for his advice on testing validity and Stephanie Tallering for her assistance with coding. The authors benefited from very helpful comments on earlier drafts by participants at PLSC 2015 and the University of Chicago Coase-Sandor Institute Conference on “Contracting over Privacy.”

Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

1

Ambiguity in Privacy Policies and the Impact of Regulation

45 Journal of Legal Studies (forthcoming)

Joel R. Reidenberg, Jaspreet Bhatia, Travis D. Breaux, Thomas B. Norton

Abstract

Website privacy policies often contain ambiguous language that undermines the purpose and value of privacy notices for site users. This paper compares the impact of different regulatory models on the ambiguity of privacy policies in multiple online sectors. First, the paper develops a theory of vague and ambiguous terms. Next, the paper develops a scoring method to compare the relative vagueness of different privacy policies. Then, the theory and scoring are applied using natural language processing to rate a set of policies. The ratings are compared against two benchmarks to show whether government-mandated privacy disclosures result in notices less ambiguous than those emerging from the market. The methodology and technical tools can provide companies with mechanisms to improve drafting, enable regulators to easily identify poor privacy policies and empower regulators to more effectively target enforcement actions.

* Respectively, Stanley D. and Nikki Waxberg Professor of Law, Fordham University and Visiting Research Collaborator, CITP, Princeton; Ph.D. candidate, Carnegie Mellon University; Assistant Professor of Computer Science, Carnegie Mellon University; Privacy Fellow, Fordham CLIP. Work on this project was supported in part by NSF Awards #1330596 and # 1330214, the Carnegie Mellon Requirements Engineering Laboratory, and a Fordham Faculty Fellowship. The authors would like to thank Stephen Broomell for his advice on testing validity and Stephanie Tallering for her assistance with coding. The authors benefited from very helpful comments on earlier drafts by participants at PLSC 2015 and the University of Chicago Coase-Sandor Institute Conference on “Contracting over Privacy.”

Page 2: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

2

TABLE OF CONTENTS 1. Introduction ............................................................................................................................................ 2

2. Defining and Measuring Ambiguity ............................................................................................. 3

2.1 Taxonomy of Vague and Ambiguous Terms through Grounded Theory ............... 3

2.2 Vague Terms ...................................................................................................................................... 8

2.3 Comparative Levels of Ambiguity ............................................................................................. 9

2.4 Ambiguity through Incompleteness ..................................................................................... 12

3. Scoring Vagueness ............................................................................................................................ 12

3.1 The Landscape of Vagueness in Privacy Policies............................................................ 13

3.2 The Scoring Model ......................................................................................................................... 13

4. Comparative Scores and the Impact of Regulation ........................................................... 15

4.1 Company Scores for Unregulated Disclosures ................................................................ 15

4.2 Scores for Regulated Disclosures ........................................................................................... 17

4.3 Normative Role of Privacy Notice Regulation ................................................................. 20

5. Public Policy Considerations: Technological Tools, Linguistic Guidelines and Reporting ........................................................................................................................................................ 21

5.1 Technical Tools ............................................................................................................................... 21

5.2 Linguistic Guidelines .................................................................................................................... 22

5.2 Reporting Framework ................................................................................................................. 23

6. CONCLUSION ....................................................................................................................................... 23

1. INTRODUCTION

Privacy policies often contain ambiguous language describing website practices for data processing activities such as collection, use, sharing, and retention. While scholars have shown weaknesses in the readability of privacy policies (McDonald and Cranor 2008; Pollach 2007; Jenson and Potts 2004) and weaknesses in the substantive protections (Marrotta-Wugler 2015; Pollach 2007), they have not focused carefully on policy ambiguity. Ambiguity regarding these practices undermines the purpose and value of a privacy policy for website users. Without clear affirmative statements, privacy policies are, in effect, meaningless. They would convey no true indication to users of the website’s actual practices and they would provide declarations that would be unenforceable. On a practical level, ambiguity

Page 3: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

3

also challenges the usability of privacy technologies for user empowerment: clarity in privacy practices is a necessary prerequisite to empowering users to make informed decisions.1 This paper will explore the problem of ambiguity in policy language. In Part II, we develop a theory for the definition of ambiguous terms and for the measurement of such terms. In Part III, we develop a scoring method to compare the relative vagueness of different privacy policies. In Part IV, we apply the theory and method using natural language processing (NLP) techniques to score a set of privacy policies for clarity and comparison. We then use these comparative rankings to examine whether regulation improves the clarity of privacy policies. To test the impact of regulation, we compare the ambiguity of policy language under three conditions: 1) no privacy regulation; 2) regulation under Gramm-Leach-Bliley Act; and 3) regulation under the US-EU Safe Harbor inter-governmental agreement. The results provide normative insight on the role of privacy notice regulation. In Part V, we address a number of practical public policy considerations resulting from our scoring. The techniques and corresponding technical tools can provide companies with a useful mechanism to improve the drafting of their policies. At the same time, automated tools embodying our theory and scoring method will enable regulators to easily scan industries and companies for poor language in their privacy policies. Such inexpensive scans revealing problems with privacy policy language then empowers regulators to more effectively target defective privacy policies for remedial action.

2. DEFINING AND MEASURING AMBIGUITY

2.1 Taxonomy of Vague and Ambiguous Terms through Grounded Theory

Ambiguity arise when a statement is incomplete and missing relevant information, or when a word or phrase has more than one possible interpretation and the reader is uncertain about which interpretation the author intended. Linguists often address vagueness as a form of ambiguity. (Massey 2014) In contract theory, vagueness connotes a distribution around a norm without a clear delineation while ambiguity refers to situations where a word may have at least two meanings. (Farnsworth 1999 § 7.8 ) In each case, multiple interpretations can arise when a statement is incomplete, or when a generic word or phrase is used in place of a more specific word or phrase. When a website privacy policy uses vague or ambiguous terms, the language choices

1 For example, the joint Carnegie Mellon University, Fordham University and Stanford University usable privacy project seeks to combine crowd sourcing, natural language processing and machine learning to develop browser plug-in technologies that will automatically interpret privacy policies for users. (Usable Privacy Project, 2016) If policies are too ambiguous, automated processing will be frustrated.

Page 4: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

4

dilute the ability of a policy to describe the website’s actual practices. This study will, however, focus on vagueness where terminology lacks specificity or context. Because privacy policies summarize an organization’s data practices, it is not surprising that policies include vagueness. There are at least two motivations for introducing vagueness: (1) the practices include divergent or separate situations where actions do and do not occur, in which case the action “may” occur, depending on what situation the individual encounters; and (2) there are foreseeable, yet unrealized actions that “may” occur in the future, and the policy authors wish to be flexible to accommodate those future actions without changing the policy. In the case of the first motivation, we believe changes to some policy statements can clarify under what situations the action does or does not occur, resulting in a less vague policy. However, the second motivation to accommodate flexibility is at best a form of inaccuracy whether the result of hedging to cover unknown existing internal practices or unknown changes and at worst misleading and misrepresentative.2 To demonstrate this effect, we show a few illustrative examples from the Barnes & Noble privacy policy (2013) concerning personal information that are commonly found in other policies. The Barnes & Noble policy includes two statements that describe the possibility of collection:

(1) “Depending on how you choose to interact with the Barnes & Noble enterprise, we may collect personal information from you . . . .”

(2) “We may collect personal information and other information about you from business partners, contractors and other third parties.”

In statement (1), the collection is conditioned upon how the user interacts with the company. This is vague, because the statement summarizes multiple situations, some of which will include the collection of personal information and some of which will not. To achieve clarity, it would be reasonable to exclude those situations where personal information is not collected, and to focus on where personal information “will be collected.” In contrast, statement (2) is vague because the conditional situations are not described, thus all third-party transactions are summarized into a single statement. By separating these statements and iterating over the different categories, the policy authors can exclude prospective collections (envisioned, but not actual collections) and those situations where personal information is not collected. Another attribute of vagueness concerns the vague conditions and purposes under which information is used. In statement (3), below, the Barnes & Noble policy links the collection to a broad purpose (improving customer experience) under a general 2 The desire for flexibility might also be seen as a reservation by the website to hold an option on the ability to engage in unstated data practices. Even if the language explicitly describes an option, the terminology still encompasses alternative meanings and the user will not know the subjective intent of the website, thus, creating vagueness.

Page 5: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

5

assumption about the “necessary” situations that define this broad purpose. An alternative statement would replace the phrase “as necessary” with specific purposes intended to improve customer service.

(3) “We collect your personal information in an effort to provide you with a superior customer experience and, as necessary, to administer our business.”

Vagueness pertaining to standard third-party transactions is also evident in the following two statements from the Barnes & Noble privacy policy:

(4) “In addition, we disclose certain personal information to the issuer of the MasterCard . . . .”

(5) “If you are accessing our goods and services using a Microsoft account, Microsoft may share your personal information with us . . . .”

In statements (4) and (5), the mechanisms for exchanging personal information are coded in software. In the case of credit card transactions, MasterCard, card issuers and acquiring banks each have transaction processing rules that are updated from time to time, but the technical specifications of their respective electronic payment processing infrastructures are less likely to change. The Barnes & Noble policy could thus restrict the kind of personal information it shares to payment information or information for the purpose of completing a purchase.3 In statement (5), Microsoft’s Live Connect API for OAuth 2.0 access to the Microsoft account is also very explicit about what information “may” be shared (first and last name, email address, gender, age) and Barnes & Noble can further commit to which of these information types they “will” collect as encoded by their software. As the case study reveals, the contours of vagueness are very complex. The measurement of vagueness, thus, becomes a valuable marker to signal whether a privacy policy is a meaningful notice of a website’s actual policies and practices and a notice that might give rise to a contractual commitment. The first step in the measurement of vagueness in privacy policies is the development of a rigorous and validated taxonomy of terms that can be used to examine a diverse set of online sectors such as shopping, news and financial services.

Linguistic scholars have identified various forms of ambiguity in the use of language and have classified textual ambiguity in various ways. (Hoffman et al. 2013; Massey et al. 2014; Pollack 2007). Some of these classifications reflect terms may have

3 Statement 4 is also separately confusing because Barnes & Noble would not typically be able to share data directly with a card issuing bank. Barnes & Noble exchanges data from the point of sale to its acquiring bank which in turn shares information specified by MasterCard to the customer’s card issuing bank through the MasterCard network. The circumstance that might give rise to direct sharing from Barnes & Noble to a card issuing bank would be the case of a co-branded card where Barnes & Noble would have a direct relationship with the card issuer.

Page 6: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

6

inherent vagueness. (Massey 2014) For example, many privacy policies use the modal verb “might” to describe data processing activities (“we might collect . . . ”) that may or may not occur in the future. In addition, policies use conditional phrases, such as “when”, “upon”, and “during”, that indicate an event upon which a particular statement becomes true (“upon consent, we will share . . . ”). When multiple modal verbs and conditional terms are used together, readers struggle to actually determine if the described practices occur, or in what combination, or under which specific conditions or how to satisfy those conditions.

For a rigorous analysis of textual ambiguity, the starting point is, thus, the establishment of a typology of ambiguous terms. Since our objective is to provide a qualitative rating of vagueness, we have chosen to focus on this narrower aspect of ambiguity. This means that our scoring will provide relational comparability, but underrate overall ambiguity.

We define our typology based on grounded theory (Glaser and Strauss 1999) to identify vague terms and classify those terms into categories. Three researchers manually performed this analysis using coding (Saldana 2012) to examine a set of policies across a variety of sectors. (Bhatia et al. 2016a). Five policies were used for the initial identification and classification of relevant terms. (Bhatia et al. 2016a)

The analysis resulted in a taxonomy with four categories as shown in Table 1. From a legal perspective, conditional terms are inherently vague because the performance of a stated action or activity will be dependent on a variable trigger. (Farnsworth 1999 § 8.2) Similarly, generalizations are terms that vaguely abstract information practices using contexts that are unclear (e.g. “typically” or “generally”). From the linguistic perspective, modality (modal verbs, adverbs and non-specific adjectives) creates uncertainty with respect to actual action (von Fintel 2006); this includes whether an action is possible, likely, permitted or obligatory, among others. If the action is only permitted, it may never occur, whereas obligatory actions are expected to occur in the future (the difference between “we may” and “we will”). Similarly, numeric quantifiers that are non-specific create ambiguity as to the actual measure. To assure the completeness of the typology, three researchers reviewed 15 policies first in their entirety and then statement by statement to identify vague phrases and determine if they fit these categories or if new categories were required.4 (Appendix, Table A1)

4 This also included an evaluation of cases where the language might appear as a borderline classification.

Page 7: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

7

Table 1

Categories of Vague Terms

Category Description

Condition Action(s) to be performed are dependent on a variable or unclear trigger

Generalization Action(s)/Information Types are vaguely abstracted with unclear conditions

Modalilty (including modal verbs)

Vague likelihood of action(s) or ambiguous possibility of action or event

Numeric quantifier Vague quantifier of action/information type

To see how a sentence may reflect these categories, the phrase “we generally may share personal information we collect on the Site with certain service providers, some of whom may use the information for their own purposes as necessary” contains a condition, generalization, modal verbs and numeric quantifiers.5 These vague terms are annotated in the sentence as shown in Figure 1:

Figure 1 Sentence Annotation

In combination, these six forms of vagueness combine to allow any organization sharing personal information under this statement to share it with anyone for any

5 The original sentence was extended to include “generally” and “as necessary” for illustration purposes; the sentence without these additions is found in Lowe’s website privacy policy on April 27, 2015 at: http://www.lowes.com/en_us/l/privacy-and-security-statement.html.

Page 8: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

8

purpose, as long as the recipient is a service provider. The combination of these six terms further leaves unclear the conditions under which information is shared, and the number or proportion of service providers that engage in this practice.

2.2 Vague Terms To complete the lexicon of vague terms, we used an established coding frame based on the taxonomy in Table 1 and three researchers analyzed a set of 15 policies. (Appendix, Table A1). To assure saturation of terms, we examined three diverse sectors (shopping, telecommunications and employment) and five policies within each sector reflecting a diversity of types of websites within each category.6 We chose privacy policies of major sites that are visited by large numbers of users.7 In this study, we reached saturation after analyzing 5 policies (Barnes & Noble, Lowes, Costco, AT&T, and Comcast) reflecting that our taxonomy of terms was complete for the privacy policy domain.

The resulting set of terms for the taxonomy is shown in Table 2.

Table 2

Results from Applying Taxonomy to Privacy Policies

Category Key Words and Phrases Distribution*

Condition depending, necessary, appropriate, inappropriate, as needed, as applicable, otherwise reasonably, sometimes, from time to time

7.20%

Generalization generally, mostly, widely, general, commonly, usually, normally, typically, largely, often, primarily, among other things

3.63%

Modality (including modal verbs)

may, might, can, could, would, likely, possible, possibly

70.60%

Numeric quantifier anyone, certain, everyone, numerous, some, most, few, much, many, various, including but not limited to

18.60%

*The distribution represents the number of vague terms in the 15 policies belonging to the category divided by the total number of vague terms in 15 policies. See Bhatia et al. (2016a).

6 We considered examining policies from top site rankings, but the various rankings did not assure diversity of sectors. 7 The small number of policies in each category preclude broad generalizations within and across categories, but do enable us to show the value of a score for comparison purposes, including comparison against the financial services benchmark

Page 9: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

9

2.3 Comparative Levels of Ambiguity While the taxonomy results in Table 2 present the terms that obscure the clarity of the policy descriptions, the taxonomy does not address the relative levels of ambiguity among the terms. For example, the following two statements appear to have different levels of ambiguity:

1) “We may generally collect …” 2) “We may collect as necessary …”

Each uses a modal verb (“may”), but the first statement containing the generalization “generally” seems less clear than the second statement containing the condition “as necessary.” The practice described “as necessary” suggests that collection will only occur in exceptional cases while “generally” suggests that collection is likely to occur under broader circumstances. This qualitative difference in clarity may also be linked to the degree of flexibility that the textual language provides to the website. Language designed to give websites greater flexibility is likely to be perceived as more ambiguous. The statement “may collect generally” provides greater flexibility to the website than “as necessary.” Consequently, the generalization term “generally” obscures for the user the website’s activities more than the conditional term “as necessary.” In addition to variations in clarity among the categories, the combination of terms from different categories in the same sentence may also affect the level of ambiguity perceived in descriptions of privacy practices. Figure 2 shows our initial hypothesis regarding the possible cumulative effect of vague terms represented as a lattice.

Figure 2 Combinations of Terms

Page 10: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

10

The lattice begins with a modal statement “we may collect” and then in the next row adds a term from each of the remaining three different categories: a generalization term “generally,” a conditional term “as needed” and a numeric quantifying term “some.” Our initial assumption was that additional terms would increase the vagueness of the statement, i.e. reduce the clarity of the description of the data collection practice. With each successive combination of vague terms, from the first to the second, third and fourth rows in Figure, vagueness would increase until some degree of saturation would occur (i.e., adding additional vague terms would have no significant impact on increasing vagueness). The relative impact of each possible combination is critical to the development of an accurate score for a privacy policy’s ambiguity. To evaluate our initial assumption, we conducted a paired comparison survey.8 The survey results show the relationship of combinations of terms on the level of ambiguity, enables the assignment of relative weights to different combinations of terms from one or more categories. We used the Bradley-Terry model that scales preferences among different pair comparisons to calculate the weights from the paired comparison data. (Turner and Firth. 2012) These results (Bhatia et al. 2016a) are presented in Figure 3 and Appendix Table A2. Figure 3 shows the Bradley-Terry coefficients for the combinations of conditions (C), generalizations (G), modal terms (M), and numeric quantifiers (N).

Figure 3 Bradley-Terry Coefficients

for inter-category combinations

8 A paired comparison survey is a standard statistical technique that collects multiple preferences between two statements from multiple judges and, through the aggregation of the results, establish a matrix of rating comparisons for all possible combinations of the terms being studied.

Page 11: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

11

These results show the quantity that each combination of vague terms contributes to the overall concept of vagueness in the survey: that data practices described with combinations at the left of the chart (CN, C, CM, …) have greater clarity than those practices described with combinations at the right of the chart (GMN, G, GM, …) While phrases with both a conditional term and a vague numeric quantifier (CN) are indistinguishably clear from phrases with just a conditional term alone (C), we can observe how the vagueness taxonomy influences overall vagueness. The arrow moving left in the chart shows that condition terms increase clarity and reduce vagueness: e.g., statements with both a modal term and numerical quantifier (MN) are significantly more vague than similar statements with the addition of a conditional term (CMN). The arrow moving right in the chart illustrates how generalizations significantly increase vagueness: e.g., the MN statements with the addition of a generalization (GMN) are significantly more vague. By comparison, statements with a generalization and modal term (GM) are twice as vague as statements with a condition and a modal term (CM). The results in Figure 3 present the inter-category vagueness. To measure the intra-category vagueness between terms within each of the categories, we conducted additional surveys. (Bhatia et al., 2016a). The survey results indicate that terms within each category have different levels of vagueness. For example, the intra-category vagueness results for the “Generalization” category are presented in Figure 4 and the results for all categories appear in Appendix Table A3.

Figure 4 Bradley-Terry Coefficients for “Generalization” terms

The results in Figure 4 show that within the “Generalization” category, vagueness appears to increase as the adverbs transition from the routine (e.g., typical, normal

Page 12: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

12

or usual) to the unrestricted (e.g., widely, largely, mostly). The full results in Appendix Table A3 show similar differentials. Within the “Conditional” category, the term “as appropriate” was several times more vague than the term “as necessary.” In the “Modal” category, the past tense verbs “might” and “could” are perceived to be more vague than the present tense variants “may” and “can,” respectively. These three observations led Bhatia et al. to conjecture that vagueness increases along three dimensions: authority, wherein discretionary practices are perceived to be more vague than mandatory practices (e.g., “as appropriate” is permissive, whereas “as necessary” is obligatory); certainty, which is the absoluteness with which practices are performed (e.g., “typical” is certain with respect to common cases, whereas “widely” is blurs the boundary between common and exceptional cases); and likelihood, which is the possibility that the practice is performed (e.g., “likely” is more likely than “possibly,” and thus less vague).

2.4 Ambiguity through Incompleteness Lastly, silence in a privacy policy can often introduce ambiguity. (Marotta-Wurgler 2015). For example, if the policy is silent on sharing data with third parties, then the policy fails to convey whether and under what conditions data may be transferred to others. As a result, completeness of the privacy policy will have an impact on the scoring of ambiguity. While there are no legal requirements spelling out all the terms that must be contained in a privacy policy, various templates might be used to determine completeness. For example, the Gramm-Leach-Bliley Act only stipulates that financial service companies provide notice to customers of their privacy policies and that the notice at a minimum contain certain types of disclosures. (Gramm-Leach-Bliley Act, 15 U.S.C. § 6803 (1999). The Federal Trade Commission and the Department of Commerce have each articulated several sets of fair information practices. (FTC 1998, 2000; Department of Commerce 2000) For purposes of this analysis, the existence of four elements will be inventoried: collection, retention, sharing and use. These elements reflect the most significant privacy harms demonstrated through litigation that we believe can be resolved by unambiguous privacy policy statements. (Reidenberg et al 2015)

3. SCORING VAGUENESS With the vagueness taxonomy populated using key words and phrases corresponding to each category, a comparative classification and a completeness indicator can be constructed to score the degree of affirmation or certainty associated with data practices for specific types of personal information. Privacy policy statements about companies that “might collect” are less certain than statements that they “will” or “will not collect” a particular information type. Highly uncertain statements can more easily accommodate a company’s future practices, thus providing these companies more flexibility in the interim to alter those practices. However, highly uncertain statements allow for interpretations that may be untrue, thus giving users a false

Page 13: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

13

sense of privacy. By contrast, if an organization has a policy that is more certain, particularly with more restrictive practices, any new unstated practices would require a change in the policy. Such changes would trigger opportunities for users to re-evaluate their relationship with those companies under the new practices. This opportunity to evaluate policy changes is necessary if the privacy principle of user consent is to have any meaning. Policies containing more certain statements are more likely to increase the opportunity for choice, since those policies will need to be revised each time a new practice is to be covered.

To score privacy policies, the first step is to determine if vague terms are common in privacy policies through an analysis of the landscape of terms found in privacy policies. The frequent existence of vague terms leads to the definition of a scoring model that can then be applied to privacy policies to rank their vagueness against each other.

3.1 The Landscape of Vagueness in Privacy Policies When the taxonomy is applied to the set of 15 privacy policies (Appendix Table A1), every policy in the data set contains vague terms. (Bhatia et al. 2016a). As shown in Bhatia et al. (2016a) and Table 2, the most frequently observed ambiguous terms are modal verbs (M) followed by numeric quantifiers (N); conditions and generalizations lag far behind. (See Appendix Table A4). This suggests that the use of modal terms will dominate all other terms in the calculation of overall vagueness in a privacy policy. However, the Bradley Terry coefficients show the significant impact that conditions, generalizations and numerical quantifiers have on modality: while modality alone (M) scores at 2.865±0.147, the addition of generality to modality (GM) scores more vague at 4.045±0.156, and the addition of conditions to modality (CM) scores less vague at 1.864±0.146. (Bhatia et al. 2016a; Appendix Table A2). In the extreme case that these additional categories of vagueness always appear with modal terms, then over one-third of the total 70.6% of modal terms will score well above or well below the coefficient for modality, alone. (Bhatia et al. 2016a) This can lead to significant differences between policies in terms of overall vagueness and especially pronounced differences within a single category of data practice (e.g., collection, retention, sharing, etc.) in the event that vagueness is concentrated in one area of the policy.

3.2 The Scoring Model Simply counting the number of vague terms in a privacy policy will not provide an adequate measure of ambiguity. For example, the AT&T policy contains 70 vague phrases, which places it at the median of 70 vague phrases and just below Time Warner, which has 85 vague phrases. But this frequency count does not indicate the relative context. Context matters, and a granular scoring model needs to take into account three key variables: 1) the existence of vague terms and their relation to specific categories of data practice (e.g., collection, retention, sharing, and usage); 2)

Page 14: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

14

the relative impact that a combination of vague terms may have on overall ambiguity; and, 3) the completeness of the policy. To accomplish this goal, we propose a scoring model based on a relative comparison of vagueness in phrases for each policy. This score is based on a statistical measure that scales the overall vagueness of individual statements in each policy based on the Bradley-Terry model for paired comparisons. The coefficients that were computed by this method serve for these calculations to rank the vagueness of every phrase in each policy containing a vague term or combinations of vague terms associated with an action-information pairing where one of the four identified data practices (action) is applied to a type of information (information).9 The data practices were extracted using the technique developed by Bhatia et al. (2016b). The vagueness scores appropriately ignore phrases that do not specifically describe a data processing activity or that do not contain any vague terms. This means that non-relevant language, such as a corporation’s philosophy relating to privacy, or unambiguously described data practices will not factor into the vagueness score. For each policy, we can then calculate an aggregate vagueness score by taking the sum of the coefficients for each action-information pair containing vague terms. This policy-specific aggregate score is not, however, sufficient to compare two policies. For example, if a policy is long, it may contain more action-information pairs containing vague terms than a shorter policy, but proportionately be much clearer. To account for this situation, we normalize the aggregate vagueness score by dividing the aggregate score by the total number of action-information pairs in the policy; we call this normalized score the vagueness score. The vagueness score reflects positively on the policy and improves if a policy has more action-information pairs that clearly describe data practices and reflects negatively on the policy and worsens if the policy has more pairs that include vague terms. Moreover, it reflects the total unit vagueness independent of policy length, but relative to the level of contribution to vagueness by each category of vague terms in Table 2. This can be represented by the following equation:

∑ (BTC A-I)

V = -----------------

∑ (A-I)

V=vagueness BTC= Bradley-Terry coefficient A-I= Action-information pair

9 If a statement contained more than one action-information pairing, then all the pairs were included and contributed to the vagueness score.

Page 15: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

15

Lastly, in case a policy has a high level of ambiguity in paragraphs pertaining to key elements that may be masked by clear language elsewhere in the policy, we calculate the vagueness scores for the collection of policy statements addressing each of the four key data practices: collection, retention, sharing and usage. These scores are calculated in the same manner as those for the overall policy. Separately, we report on the completeness of the privacy policies using a scale of 0 to 4. For each element missing from the four data practices (collection, retention, sharing and use), the policy is assigned one point. Thus, a policy containing any description for all four elements will score a 0 and a policy missing all four elements will score a 4.

4. COMPARATIVE SCORES AND THE IMPACT OF REGULATION

4.1 Company Scores for Unregulated Disclosures Applying the scoring model to the privacy policies of companies that do not have specific notice obligations results in the vagueness scores reported in Table 3. Where the ratios are in proximity to each other, they indicate that those policies have similar levels of ambiguity. Where a ratio is double another, the ratios indicate that the policy with the higher ratio is twice as vague as the policy with the lower ratio.

Page 16: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

16

Table 3

Unregulated Companies

Privacy Policy Vagueness Scores

Privacy Policy Total Score

Collect Retain Share Use Completeness

Costco Score 1.02 0.68 0.95 1.51 0.63 0

S.E. 0.06 0.04 0.05 0.08 0.04

JC Penny Score 1.19 1.32 1.44 1.16 1.07 0

S.E. 0.07 0.08 0.08 0.07 0.06

Lowes Score 1.28 0.87 2.15 2.06 1.25 0

S.E. 0.07 0.05 0.11 0.11 0.07

OverStock Score 1.71 1.56 1.44 2.03 1.62 0

S.E. 0.09 0.09 0.08 0.11 0.09

AT&T Score 1.04 0.92 0.45 1.25 0.99 0

S.E. 0.06 0.05 0.04 0.07 0.06

Charter Comm.

Score 1.64 1.54 1.02 1.72 1.84 0

S.E. 0.09 0.09 0.07 0.09 0.1

Comcast Score 1.80 1.71 1.75 1.96 1.66 0

S.E. 0.10 0.10 0.10 0.11 0.09

Time Warner Score 2.09 2.1 2.79 1.72 2.17 0

S.E. 0.11 0.11 0.15 0.09 0.12

Verizon Score 1.38 1.41 0.80 1.48 1.34 0

S.E. 0.08 0.08 0.05 0.08 0.08

Simply Hired Score 1.56 1.44 0.64 1.12 1.97 0

S.E. 0.09 0.08 0.04 0.07 0.11

Mean 1.36 1.34 1.60 1.45 1.47 0

Table 3 shows that the most ambiguous policies among the unregulated entities belong to Time Warner, with Comcast, Overstock.com, and Charter Communications clustered close behind. These policies use large numbers of ambiguous modal verbs

Page 17: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

17

and quantifiers. (Appendix Table 3) For example, the Comcast policy describes sharing with third-parties using both a modal verb and numeric quantifier:

“In certain situations, third party service providers may transmit, collect, and store this information on our behalf to provide features of our services.”

By contrast, Costco’s language describing sharing with third parties is more direct:

“We do not otherwise sell, share, rent or disclose personal information collected from our pharmacy pages or maintained in pharmacist records unless you have authorized such disclosure, or such disclosure is permitted or required by law.”

By comparison to these most vague policies, the policies belonging to Costco and AT&T are almost twice as clear. Table 3 also shows the vagueness scores for actions to collect, retain, share and use information. The overall mean vagueness across these four data actions varies little from 1.34-1.60; however, the mean variance is not homogenous across practices (collect variance =0.21, retain variance=0.52, share variance =0.10, and use variance=0.30). This variance across practices shows divergent uses of vague terms across companies, with the least consistency across policy descriptions of retention practices, and the most consistency around descriptions of sharing practices. Notably, companies such as Comcast, and Time Warner score higher than average vagueness in all four data practice categories. For the website user, however, Overstock’s high vagueness score for sharing (2.03) presents a more significant, or fundamentally different, privacy risk than Comcast’s vagueness regarding collection (1.71) and retention (1.75). Vagueness with respect to sharing is significant because third parties are rarely identified in privacy policies and most privacy policies disclaim responsibility for the data practices of the unnamed third parties. Vagueness with respect to collection and retention affords companies greater flexibility in broadening what kinds of information they are potentially collecting. This may or may not present heightened privacy risks. However, when combined with ambiguous sharing terms, website users will not be able to ascertain exactly what information may be at risk of sharing with third parties. All the policies not subject to regulation were complete.

4.2 Scores for Regulated Disclosures Because the score ratios are designed to compare the clarity of policies against each other and do not provide a minimum level of acceptability for ambiguity, the Model Privacy Form adopted under the Gramm Leach Bliley Act can serve as an informative target benchmark for a regulated notice. This model form was adopted by regulatory agencies after careful analysis and testing of language options. (Levy and Hastak 2008) In fact, eight federal financial service regulatory agencies approved the

Page 18: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

18

language used in this standardized privacy disclosure statement. Financial service providers may use the model form to satisfy their obligations under the Gramm-Leach-Bliley Act, though they are not required to adopt its language. Table 4 presents the vagueness score calculations for a set of five large, national financial institutions that adopted privacy policies based on the Model Privacy Form.

Table 4

Financial Services’ Vagueness Scores

Privacy Policy Total Score

Collect Retain Share Use Complete

Bank of America Score 0.96 0.48 2.87 1.03 0 0

S.E. 0.05 0.03 0.15 0.06 0

Capital One Score 0.52 0.58 2.87 0.38 0 0

S.E. 0.03 0.03 0.15 0.02 0

Citi Group*

Score 0.45 0.58 - 0.43 0 1

S.E. 0.03 0.03 - 0.03 0

JP Morgan Score 0.36 0.48 0 0.56 0 0

S.E. 0.02 0.03 0 0.03 0

PNC* Score 0.35 0.58 - 0.31 0 1

S.E. 0.02 0.03 - 0.02 0

Mean 0.52 0.54 1.91 0.54 0

*The Citi Group and PNC policies do not talk about retention practices. The mean score for retention is computed excluding these policies.

Another benchmark can be derived from the US-EU Safe Harbor Agreement etEU Safe Harbor].10 The EU Safe Harbor identifies data practices that must be contained and described in a privacy policy to satisfy European data export requirements, but stops short of providing model language like the Model Privacy Form in the United States.

10 Although the Safe Harbor Agreement was struck down in Schrems v. Ireland, CJEU

C-362/14 (Judgment of Oct. 6, 2015), http://tinyurl.com/occ4mvx., the policies previously written under the agreement are nevertheless a valuable benchmark for a regulatory rule that does not provide explicit language for privacy notices.

Page 19: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

19

The framework was negotiated between the US and Europe and then approved by the US Department of Commerce. Companies may benefit from the EU Safe Harbor if they include specified provisions in their privacy notices and register with the US Commerce Department. Of the 15 companies in our data set (Bhatia et al. 2016a), five are members of the EU Safe Harbor. (US-EU Safe Harbor List, 2015) Table 5 applies the scoring model to these five privacy policies.

Table 5

Safe Harbor Companies’ Vagueness Scores

Policy Policy Total Score

Collect Retain Share Use Complete

Barnes & Noble

Score 2.07 2.19 1.49 2.3 1.78 0

S.E. 0.12 0.12 0.09 0.13 0.10

Career Builder

Score 0.84 0.83 0.81 0.89 0.85 0

S.E. 0.05 0.05 0.05 0.05 0.05

GlassDoor

Score 1.36 1.41 1.23 1.54 1.26 0

S.E. 0.08 0.08 0.07 0.09 0.07

Indeed

Score 0.96 0.8 1.08 1.04 0.94 0

S.E. 0.05 0.04 0.06 0.06 0.05

Monster

Score 0.79 0.86 0.72 1.12 0.58 0

S.E. 0.05 0.05 0.04 0.06 0.04

Mean 1.20 1.22 1.07 1.38 1.08

The mean vagueness score for the financial services policies is considerably lower than the Safe Harbor policies: 0.52 vs. 1.20. This striking two-plus fold difference means that financial services policies are more than twice as clear as the Safe Harbor policies. Similarly, the vagueness scores show that the descriptions of three of the four data practices found in the financial services policies have greater clarity than those found in the Safe Harbor policies. As a benchmark, the Model Privacy Form for the financial services industry holds privacy policies to a higher standard of clarity and allows less ambiguity than the US-EU Safe Harbor.

Page 20: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

20

All the benchmark policies were complete with exception of the Citi Group and PNC policies that were silent on data retention.

4.3 Normative Role of Privacy Notice Regulation Comparing the vagueness scores for the regulated benchmark policies against the unregulated policies shows that the unregulated policies have notably higher scores and use significantly more ambiguous language. The findings indicate that more specific regulation of policy language has a positive impact on the clarity with which privacy policies describe data practices. Table 6 compares the mean vagueness scores of the two benchmarks to the privacy policies of the unregulated companies.

Table 6

Mean Vagueness Scores across Sectors: Regulated and Unregulated Policies

Privacy Policy Vagueness

Score Collect Retain Share Use

Financial Services 0.52 0.54 1.91 0.54 0.00

Safe Harbor 1.20 1.22 1.07 1.38 1.08

Unregulated 1.36 1.34 1.60 1.45 1.47

Mean 1.03 1.04 1.53 1.17 0.85

The overall mean score for vagueness across data collection, retention, sharing and usage is 1.03. While the number of policies per sector is presently small, we see a few mean differences when accounting for outliers or extreme differences. Among those surveyed, telecommunications companies with the exception of AT&T show higher vagueness about collection practices than employment or shopping. The mean vagueness score for the unregulated policies in Table 6 is 1.36, which is over 2.5 times higher than the mean for financial services policies (0.52). Similarly, the mean vagueness score for the Safe Harbor polices is 1.20, which is slightly lower than the unregulated companies, but well over twice as high as the financial services benchmark.11 With respect to the descriptions of data practices (collection, retention, sharing, and use), the unregulated privacy policies have significantly less clarity than

11 This slight difference may be a reflection of self-selection; companies choosing to adhere to Safe Harbor may be more transparent than typical companies. The closeness of the Safe Harbor scores to the unregulated scores, however, suggests that the self-selection effect is limited at most.

Page 21: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

21

the financial services benchmark in all areas except retention, and have mildly less clarity than the Safe Harbor policies in all areas. The more than double difference between the means for financial services policies (0.52) and the Safe Harbor policies (1.20) indicates that a regulatory nudge providing specific model language (financial services) results in less ambiguous policies than a regulatory framework (Safe Harbor) that only states what kind of data practices must be described in a policy.12 The difference also shows that Safe Harbor does a poor job assuring clear descriptions of data practices when compared to the Model Privacy Form. The difference between the means for unregulated policies (1.36) and Safe Harbor polices (1.20) similarly shows that the Safe Harbor did not largely improve the clarity of privacy policies. Overall, the comparisons to the benchmarks indicate that the market produces privacy policies that are more ambiguous than those subject to some form of regulation. The least ambiguous policies were those adopted by financial services companies after government regulators provided their imprimatur to specific language. While financial institutions were not required to adopt the Model Privacy Form, doing so assured compliance with their legal obligations under Gramm-Leach-Bliley. At the same time, the policies with the highest level of vagueness were those of the unregulated companies.

5. PUBLIC POLICY CONSIDERATIONS: TECHNOLOGICAL TOOLS, LINGUISTIC GUIDELINES AND REPORTING

Because the vagueness scores show that privacy policies for unregulated companies are notably ambiguous when compared to the federal benchmark privacy policies for financial institutions, there is a critical need to improve the clarity of online privacy policies. The finding calls for three policy tools. First, technological mechanisms that can easily score the vagueness of large numbers of policies to identify those that are problematic must exist. Second, linguistic guidelines need to be developed so that drafters have a framework to reduce vagueness in policies and so that regulators can point to a set of norms that reduce ambiguity for users. Lastly, a framework is needed to assist privacy policy drafters with reducing vagueness and ambiguity in policies.

5.1 Technical Tools

The ability to identify and score large numbers of privacy policies for vagueness can help drafters recognize and improve the clarity of privacy policies and help regulators

12 Interestingly, the unregulated policies are less vague with respect to sharing than the Safe Harbor policies.

Page 22: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

22

identify industries or companies for improvement or enforcement actions. Natural language processing (NLP) and machine learning (ML) tools can provide this functionality. The NLP tools can take annotated privacy policies and, using the taxonomy in Table 2 and the vagueness scoring method based on the coefficients in Figure 3 generate comparable vagueness scores for the policy.13 The ML tools can be developed, trained and used to extract policy language relevant to specific data practices and used to annotate policies for analysis.14 These extracted paragraphs can then be scored separately by NLP tools. These processes would enable drafters to easily identify ambiguity issues in their policies. The causes of vagueness may be due to a desire for flexibility, or it may be due to the policy author’s incomplete knowledge about the actual data practices. In this last respect, a vagueness score could be used as motivation for conducting internal audits to rationalize the use of less vague language for specific practices. Such audits could improve internal accountability and transparency, resulting in a company’s rise to a higher standard of care in regards to data protection and privacy.

Because these processes are automated, they can be easily executed on large numbers of privacy policies. With a database of thousands of privacy policies, scores could be generated automatically and outliers flagged for investigation by relevant regulatory authorities. For example, the Global Privacy Enforcement Network composed of data privacy regulators around the world conducted a manual sweep of Internet sites to identify transparency issues for privacy. (Privacy Commissioner of Canada 2013). This can be done on a much larger scale through these automated tools.

5.2 Linguistic Guidelines We propose a set of guidelines designed to yield statements that will, through the use of these natural language processing tools, measurably show improvement in clarity. The goal of these guidelines is a reduction in the vagueness score.

Four principles can be applied to improve the vagueness score corresponding clarity:

avoid terms in Table 2 that are shown to be problematic, specifically generalizations and modal verbs, and avoid terms in Appendix Table A3 that increase vagueness;

use a glossary of key terms so that a company’s legal counsel can

standardize terminology with a broad range of software developers (website developers, mobile app developers, database administration, and backend office administration).

13 These NLP tools were developed for this paper and will be released as part of the Usable Privacy Project. See http://usableprivacy.org. 14 These ML tools are under development by the Usable Privacy Project and will be released once completed. See id.

Page 23: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

23

signal when the meaning of terms changes within a policy such that automated tools can readily detect the changed meaning

5.3 Reporting Framework

Beyond these guiding principles, we propose a framework to enable public reporting of the vagueness scores to enhance the public transparency of ambiguous privacy policies. The framework consists of two elements. First, we propose that the vagueness scores represent measurements that are comparable across privacy policies. These scores represent the vague quality of a policy rather than the actual practice of the website or the substantive content of the privacy protections being offered. Second, we propose that the scores for the Model Privacy Form be used as a benchmark standard for acceptable ambiguity against which other policies be measured. In the future, regulators may wish to present new scores and thresholds that companies can seek to achieve through better policy language that consumers can understand.

6. CONCLUSION If privacy policies are to describe data practices in a meaningful way, the clarity of language is a critical feature. Policies that obfuscate data practices fail to provide consumers with adequate or appropriate notice of the treatment of their personal information. The measurement and comparison of vagueness across different privacy policies can be used to reveal these failures. On a normative level, the scoring and comparison of ambiguity shows that descriptions of data practices suffer a lack of clarity when regulation does not address the kinds of terms to be used in a policy. And more broadly, the approach and techniques for scoring ambiguity can be generalized for application to other consumer oriented legal documents. The tools can, for example, be deployed to analyze the clarity of end user agreements. To deploy the tools in this and other similar contexts, several steps will need to be taken. First, some of the terms in Table 3 may be domain specific to privacy policies and, as a result, these terms need to be updated to reflect the type of document that will be analyzed. Updates may be accomplished by crowd sourcing the review of a handful of documents, since saturation can be achieved without an assessment of large numbers of documents. Second, the set of documents to be scored needs annotations that identify the actions and information types to be assessed. These annotations can be crowd-sourced or potentially accomplished through the use of machine learning. Lastly, the granular elements that would be valuable to score separately need to be identified. With these elements in place, the scoring tool can then provide vagueness scores for the documents and their sub-elements. To enhance the interpretation and

Page 24: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

24

identify threshold problems, a benchmark document can provide scores for comparison.

APPENDIX

TABLE A1

Selected Privacy Policies

Type of Site Policy Last policy update Shopping Barnes and Noble 05/07/2013 Shopping Costco 12/31/2013 Shopping JC Penny 05/22/2015 Shopping Lowes 04/25/2015 Shopping Over Stock 01/09/2013 Telecommunications AT&T 09/16/2013 Telecommunications Charter Communication 05/04/2009 Telecommunications Comcast 03/01/2011 Telecommunications Time Warner 09/2012 Telecommunications Verizon 10/2014 Employment Career Builder 05/18/2014 Employment Glassdoor 09/09/2014 Employment Indeed 2015 Employment Monster 03/31/2014 Employment SimplyHired 4/21/2010

Page 25: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

25

TABLE A2

Bradley-Terry Vagueness Coefficients

Category of Vagueness

Coefficient Standard Error

CN 1.619 0.1461

C 1.783 0.1458

CM 1.864 0.1457

CMN 2.125 0.1455

CG 2.345 0.1456

CGN 2.443 0.1457

MN 2.569 0.1458

N 2.710 0.1461

M 2.865 0.1466

CGMN 2.899 0.1467

CGM 2.968 0.1470

GN 3.281 0.1485

GMN 3.506 0.1500

G 3.550 0.1503

GM 4.045 0.1555 C= condition, G= generalization, M= modal term, N= numeric quantifier Source: Bhatia, Jaspreet, Travis D. Breaux, Joel R. Reidenberg, and Thomas B. Norton. “A Theory of Vagueness and Privacy Risk Perception.” Accepted paper at IEEE 24th International Requirements Engineering Conference, 2016

Page 26: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

26

TABLE A3

Vague term Coefficient

Standard Error

Co

nd

itio

na

lity

as needed 0.00 0.00 as necessary 0.01 0.15

as appropriate 0.70 0.14 depending 0.77 0.14 sometimes 1.20 0.15

as applicable 1.37 0.15 otherwise reasonably

determined 1.52 0.15

from time to time 1.81 0.15

Ge

ne

rali

ty

typically -0.38 0.11 normally -0.34 0.11

often -0.15 0.11 general -0.11 0.11 usually -0.04 0.11

generally 0.00 0.00 commonly 0.03 0.11

among other things 0.64 0.11 widely 0.67 0.11

primarily 0.70 0.11 largely 1.25 0.13 mostly 1.71 0.14

Nu

m. Q

.

certain -0.53 0.22 most -1.21 0.24 some 0.00 0.00

Mo

da

lity

likely -0.32 0.13 may 0.00 0.00 can 0.42 0.13

would 0.60 0.13 might 0.76 0.13 could 0.96 0.14

possibly 1.78 0.15 Source: Bhatia, Jaspreet, Travis D. Breaux, Joel R. Reidenberg, and Thomas B. Norton. “A Theory of Vagueness and Privacy Risk Perception.” Accepted paper at IEEE 24th International Requirements Engineering Conference, 2016

Page 27: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

27

TABLE A4

Frequency of Relevant Vague Terms by Category and Policy*

Policy Condition General-ization

Modality Numeric

Quantifier

Complete- ness

Sh

op

pin

g Barnes & Noble 12 4 98 24 0

Costco 6 7 50 7 0

JC Penny 6 0 29 12 0

Lowes 2 0 62 11 0

OverStock 1 1 19 3 0

Te

leco

m

AT&T 3 0 52 15 0

Charter Comm. 8 4 81 41 0

Comcast 20 9 91 13 0

Time Warner 1 6 47 31 0

Verizon 14 1 101 19 0

Em

plo

ym

en

t

Career Builder 1 3 28 8 0

GlassDoor 5 3 42 11 0

Indeed 0 1 33 6 0

Monster 3 0 28 2 0

SimplyHired 1 3 55 12 0

* These frequency counts reflect each instance in which a word in the taxonomy represented in Table 2 is used in association with a type of personal information and a data processing action applied to the information. See Travis D. Breaux and Florian Schaub, Scaling Requirements Extraction to the Crowd: Experiments on Privacy Policies, 22nd IEEE International Requirements Engineering Conference (RE'14), Karlskrona, Sweden, pp. 163-172, Aug. 2014.

Page 28: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

28

REFERENCES Barnes & Noble. “Barnes & Noble Privacy Policy.” Revised May 7, 2013.

http://www.barnesandnoble.com/u/nook-for-windows-privacy-policy/379003719. Bhatia, Jaspreet, Travis D. Breaux, Joel R. Reidenberg, and Thomas B. Norton. “A Theory of

Vagueness and Privacy Risk Perception.” Accepted paper at IEEE 24th International Requirements Engineering Conference, 2016.

Bhatia, Jaspreet, Travis D. Breaux, Florian Schaub. "Privacy Goal Mining through Hybridized

Task Re-composition" ACM Transactions on Software Engineering Methodology, 25(3): Article No. 22, 2016.

Farnsworth, E.A. Contracts, 3rd ed. New York: Aspen, 1999. Federal Trade Commission. “Privacy Online: Fair Information Practices in the Electronic

Marketplace.” https://www.ftc.gov/sites/default/files/documents/reports/privacy-online-fair-information-practices-electronic-marketplace-federal-trade-commission-report/privacy2000.pdf (accessed March 15, 2016).

Federal Trade Commission. “Privacy Online: A Report to Congress.”

https://www.ftc.gov/sites/default/files/documents/reports/privacy-online-report-congress/priv-23a.pdf (accessed March 15, 2015).

Glaser, Barney G., and Anselm L. Strauss. The Discovery of Grounded Theory: Strategies for

Qualitative Research. New Brunswick: Aldine Transaction, 1999. Hoffman, Paul, A. Lambon Ralph, Matthew, and Timothy T. Rogers. “Semantic Diversity: A

Measure of Semantic Ambiguity Based on Variability in the Contextual Usage of Words.” Behavioral Research 45 (2013): 718-730.

Interagency Notice Project. Consumer Comprehension of Financial Privacy Notices: A Report

on the Results of Quantitative Testing, by Alan Levy and Manoj Hastak. 2008. http://www.sec.gov/comments/s7-09-07/s70907-21-levy.pdf (accessed March 15, 2016).

Jensen, Carlos, and Colin Potts. “Privacy Policies as Decision-Making Tools: An Evaluation of

Online Privacy Notices.” Paper presented at the proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’04), Vienna, Austria, April 24-29, 2004.

Marotta-Wurgler, Florencia. “Does ‘Notice and Choice’ Disclosure Regulation Work? An

Empirical Study of Privacy Policies.” Working Paper, January 15, 2015. Marotta-Wurgler, Florencia. “Understanding Privacy Policies: Content, Self-regulation, and

Market Forces.” Paper presented at the proceedings of the University of Chicago Coase-Sandor Institute “Contracting over Privacy” Conference, Chicago, Illinois, October 16-17, 2015.

Page 29: Ambiguity in Privacy Policies and the Impact of Regulationshmat/courses/cs5436/reidenberg.pdf · 2018-04-21 · actual collections) and those situations where personal information

Revised 07-12-2016

29

Massey, A.K., Rutledge, R.L., Antón, A.I., and P.P. Swire. “Identifying and Classifying Ambiguity for Regulatory Requirements.” Paper presented at the IEEE 22nd International Requirements Engineering Conference, Karlskrona, Sweden, August 25-29, 2014.

McDonald, A., and Lorrie Faith Cranor. “The Cost of Reading Privacy Policies.” I/S Journal of

Law and Policy for the Information Society 4 (2008): 540-565. Office of the Comptroller of the Currency, Treasury (OCC), Board of Governors of the Federal

Reserve System (Board), Federal Deposit Insurance Corporation (FDIC), Office of Thrift Supervision, Treasury (OTS), National Credit Union Administration (NCUA), Federal Trade Commission (FTC), Commodity Futures Trading Commission (CFTC), and Securities and Exchange Commission (SEC). “Final Model Privacy Form Under the Gramm-Leach-Bliley Act.” Federal Register 74, no. 62890-62994 (December 1, 2009).

Office of the Privacy Commissioner of Canada. “Privacy Enforcement Authorities Launch First

Ever International Internet Privacy Sweep.” May 6, 2013. Accessed March 14, 2016. https://www.priv.gc.ca/media/nr-c/2013/nr-c_130506_e.asp.

Pollach, Irene. “What’s Wrong with Online Privacy Policies?” Communications of the ACM 50

(2007): 103-108. Reidenberg, Joel R., Russell, N. Cameron, Callen, Alexander, and Sophia Qasir. “Privacy Harms

and the Effectiveness of the Notice and Choice Framework.” I/S Journal of Law & Policy for the Information Society 11 (2015): 485-524.

Time Warner. “Time Warner Safe Harbor Certification.” Accessed March 14, 2016.

https://safeharbor.export.gov/companyinfo.aspx?id=28829. Saldaña, Johnny. The Coding Manual for Qualitative Researchers. London: Sage, 2012. Turner, Heather, and David Firth. “Terry Models in R: The BradleyTerry2 Package.” Journal of

Statistical Software 48(9): 1-21. Usable Privacy Project. “Usableprivacy.org.” Accessed March 14, 2016. http://usableprivacy.org. U.S. Department of Commerce, International Trade Administration. “Notice: Issuance of Safe

Harbor Principles and Transmission to European Commission.” Federal Register 65, no. 45665-45686 (July 24, 2000).

U.S. Department of Commerce. “U.S. Department of Commerce US-EU Safe Harbor List.”

Accessed March 14, 2016. https://safeharbor.export.gov/companyinfo.aspx. von Fintel, Kai. “Modality and Language.” In Encyclopedia of Philosophy - Second Edition,

edited by Donald M. Borchert. Detroit: MacMillan Reference, 2006.