29
Towards a Conversion Tracking system for Television that uses Set Top Box data and Conversion timestamps BRENDAN KITTS, PRECISIONDEMAND Television advertising is the largest category of advertising in the United States with over 65 billion being paid for ads per year. In the past, rational bidding has been hampered by an inability to determine which ads are causing conversions. On the web conversions can be tracked using cookies. On TV there has traditionally been no technology available to track a purchase from an ad view. We show how a new source of data that has only become available in the past few years - Set Top Box data - can be used to develop conversion rate information. Set Top Box units can now measure household viewing of ads, and using advertiser data, whether those households converted. By joining this data together, conversion rates per media exposure are able to be developed for the first time ever in this industry. The privacy implications of this system are discussed, and an Anonymization process is described which is designed to protect both STB viewer and converter privacy, allowing for the measurement of conversions without identifying who is converting. We analyze the performance of this system in tracking conversions for a large- scale Television Ad campaign. Categories and Subject Descriptors: C.2.2 [Computer-Communication Networks]: Network Protocols General Terms: Television, ROI, Conversion Tracking Additional Key Words and Phrases: Attribution, TV ACM Reference Format: Reviewer Version, 2013.Television Conversion Tracking using Anonymized Set Top Box Data. ACM Trans. Embedd. Comput. Syst. 9, 4, Article 39 (March 2010), 6 pages. DOI:http://dx.doi.org/10.1145/0000000.0000000 1. INTRODUCTION Television (TV) is larger than any other form of advertising in the United States as measured by advertising spend (IAB, 2012). Time spent per capita watching TV continues to grow in all age and gender groups, as does TV ad spend (Nielsen, 2012). Author’s addresses: Brendan Kitts, PrecisionDemand, 821 Second Ave, Seattle, WA. 98104. USA. Permission to make digital or hardcopies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credits permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. © 2010 ACM 1539-9087/2010/03-ART39 $15.00 DOI:http://dx.doi.org/10.1145/0000000.0000000 ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY 39

A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Towards a Conversion Tracking system for Television that uses Set Top Box data and Conversion timestamps

BRENDAN KITTS, PRECISIONDEMAND

Television advertising is the largest category of advertising in the United States with over 65 billion being paid for ads per year. In the past, rational bidding has been hampered by an inability to determine which ads are causing conversions. On the web conversions can be tracked using cookies. On TV there has traditionally been no technology available to track a purchase from an ad view. We show how a new source of data that has only become available in the past few years - Set Top Box data - can be used to develop conversion rate information. Set Top Box units can now measure household viewing of ads, and using advertiser data, whether those households converted. By joining this data together, conversion rates per media exposure are able to be developed for the first time ever in this industry. The privacy implications of this system are discussed, and an Anonymization process is described which is designed to protect both STB viewer and converter privacy, allowing for the measurement of conversions without identifying who is converting. We analyze the performance of this system in tracking conversions for a large-scale Television Ad campaign.Categories and Subject Descriptors: C.2.2 [Computer-Communication Networks]: Network ProtocolsGeneral Terms: Television, ROI, Conversion Tracking Additional Key Words and Phrases: Attribution, TVACM Reference Format:Reviewer Version, 2013.Television Conversion Tracking using Anonymized Set Top Box Data. ACM Trans. Embedd. Comput. Syst. 9, 4, Article 39 (March 2010), 6 pages.  DOI:http://dx.doi.org/10.1145/0000000.0000000

1. INTRODUCTIONTelevision (TV) is larger than any other form of advertising in the United States as measured by advertising spend (IAB, 2012). Time spent per capita watching TV continues to grow in all age and gender groups, as does TV ad spend (Nielsen, 2012). Because of its size, economic importance, dominance in ad dollars, continued growth, and fundamentally similar auction mechanisms, this medium should be of great interest to the ecommerce community. Yet measuring TV’s effects remains ellusive. On the web conversions can be tracked using cookies. There is no such tracking mechanism on TV – viewers simply see ads and then sometime later go to a store or website and purchase.

The lack of conversion tracking on TV has arguably led to a proliferation of untargeted, irrelevant ads. Solving the TV conversion tracking problem could be of great significance for computational advertising.

In this paper we describe work aimed at attempting to automate conversion tracking on TV ads using Set Top Boxes.

Author’s addresses: Brendan Kitts, PrecisionDemand, 821 Second Ave, Seattle, WA. 98104. USA. Permission to make digital or hardcopies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credits permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected].© 2010 ACM 1539-9087/2010/03-ART39 $15.00DOI:http://dx.doi.org/10.1145/0000000.0000000

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

39

Page 2: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:2 B. Kitts et al.

2. PRIOR WORKDespite the revenue and high stakes involved in TV advertising, the most widely used methods for tracking TV effects have changed little from the 1950s. The early pioneers of television such as Nielsen set up “panels” - groups of paid viewers who wrote down their activity in diaries. Today this method is still the main method used for tracking TV effects. Unfortunately the method has significant limitations including size, cost, extrapolation error, and there have been several well-publicized problems (Segal, 2007; Schneider, 2009; Anders, 2010).

Recently IPTV has opened the possibility of tracking conversions from viewers who could use their TV sets to navigate to websites and purchase. However only 8% of TV sets support IPTV as of the time of printing (McDonough, 2012), and even if this gains widespread adoption, people can still purchase cross-channel from stores, websites, or by calling customer support, leaving no trace of their having learned about the product from TV.

There is strong evidence that cross-channel and delayed conversions may be the majority of the effects due to television. Several studies have suggested that TV works at “the top of the funnel” (Nelson-Field, et. al., 2012). Our own experimental work supports this hypothesis. We have documented post-campaign conversions from live TV campaigns that are three times larger than those that were able to be observed and attributed during a campaign, and which persisted at significant levels over 465 days after TV advertising. Given that post-event lift is the majority of TV’s effects, a method is needed that would be able to capture these delayed and cross-channel conversions.

3. OVERVIEWThe method we describe exploits an important new source of data which has only recently become available. Set Top Box data makes it possible to measure ad delivery to specific households. Each household receives a certain exposure of ads, and then either converts or does not convert. Analyzing the ad weight load and targeting, it is possible to identify how consumers are responding to TV advertising. This method is similar to “view conversions” that have been used online (Chandler-Pepelnjak and Song, 2000), but we have not seen any reported applications in the TV space, nor was it really possible until just a couple of years ago. The method has some major advantages including being able to utilize large populations, and is implementable using existing television hardware.

The structure of the paper is as follows: we first discuss methods for Set Top Box data processing. We then discuss how we bring together four data sources to be able to see ad exposures and conversions. Working with TV viewing and conversion data creates new responsibilities for privacy protection, and a new technique for Anonymization is discussed next which ensures that personal information is stripped. We show experimental results using the technique. We finish with a discussion about the importance of ROI tracking for TV.

4. HOW SET TOP BOXES WORKSet Top Boxes are small devices that interpret the satellite or cable signal, and turn it into content that can be displayed by television sets. Set Top Boxes emerged due to the rapidly growing variety of TV signals and services

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 3: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:3

that occurred from 1990 to 2010, the increase of cable and satellite subscription over the same period, and the move to all digital broadcasting in the United States in 2009. Amongst other things, Set Top Boxes (i) demodulate cable signals, (ii) interpret High Definition TV (HDTV), (iii) support Video On Demand (VOD) and Pay per view (PPV), (iv) support DVR (digital video recorder) functionality, and (v) Satellite, Cable, IPTV and Terrestrial antenna broadcast digital protocols (Figure 1).

Set Top Boxes have quietly revolutionized the television industry. Since 1996, Set Top Boxes have increased from about 79% of households to over 91.5% in the United States (Nielsen, 2012). Many of these units are now capable of capturing usage data. Currently about 61% of TV households have a Set Top Box with a return path capability (IAB, 2012). The specific kind of return path varies with the data connection; for example IPTV installations can send data back in real-time, where-as Direct Broadcast Satellite often utilizes the phone line to update the unit once per day.

Rentrak, Nielsen, and other companies have begun to use this return path data to augment audience ratings. Usage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver servers (IPTV). Rentrak is currently reported to have 8.6 million households being used for their audience tracking information (Rentrak, 2012a). However, work has really only begun in this area.

In this paper we consider the potential for the technology to support conversion tracking, which has been the “holy grail” for the television industry.

Figure 1. Diagram showing three TV providers, and return path data for each.

5. SET TOP BOX DATA PROCESSINGA Set Top Box transaction record comprises the following tuple (MRC, 2012):

(DeviceID, EventID, DateTime, TimeZone)

where DateTime is recorded by the local Set Top Box unit. Each of these tuples represents a remote control key-press or state change event. The Set Top Box event that we’ll focus on are the “tune” events in which the user navigates and selects a channel, and the on/off events. We filter the above data to isolate these.

5.1 SessionizingThere are two methods for identifying session boundaries:

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 4: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:4 B. Kitts et al.

(1) Set Top Box On/Off events: Sometimes consumers will switch set top boxes off after viewing, and on to start viewing, and these provide a very convenient way of identifying session starts and ends.

(2) Inactive periods: If there are no detected remote control keypresses for a certain number of hours, we determine that the session has ended.

In weblog processing, 30 minutes is a typical setting for session boundaries (Berendt, et. al., 2006). In Set Top Box MRC (2012) has proposed that 6 hours of inactivity should be used for creating a session boundary.

We analyzed the time between last keypress and off events for Set Top Box data. There is a spike in off events after 4 hours of inactivity that may be due to most viewing being prime-time followed by a “cliff” around 4 hours later which coincides with the sleep period.

We also analyzed the time between keypress events of any kind. There are spikes every half hour, which seem to be related to users switching to another program after viewing a previous program. There are also large spikes in tune events at 4 and 24 hours. The 24 hour spike suggests that people may be watching TV at around the same time each day. Based on the suggestive spikes at 4 hours, we use four hours as our inactive period setting (Figure 2).

0.0000%

0.0001%

0.0005%

0.0050%

0.0500%

0 2 4 6 8 10 12 14 16 18 20 22 24 26

% o

f eve

nts

Hours until next event

4 hours 24 hours

Figure 2. Hours between tune events by the same device

0%

1%

2%

3%

4%

5%

6%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

% o

f vie

wer

s

Hours viewed per day

Figure 3. Hours viewed per person. Most people view about 2.5 hours of TV per day.

Mean

Median

Mode

upper 90th

lower 10th

Hours per 4.17 3.63 2.50 7.64 1.49

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 5: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:5

day

Figure 4. Basic viewing statistics in the Set Top Box data.

5.2 Channel SurfingOne of the most salient behaviors in Set Top Boxes are the large number of station request events that occur in quick succession (Figure 5) as users “surf” channels looking for a program to watch. Over 70% of events are “tune surf” events. We filter these by ensuring that the time a viewer spent on station needed to be > 10% (1.5 minutes of each 15 minute bucket) for the station viewing to be recorded. Figure 10 shows surfing that is kept - a viewer who is switching between two programs, but who appears to be watching both programs.

5.3 TVs Left on ContinuouslyPeople who leave their Set Top Boxes on all of the time should also be filtered out. Although very high viewing levels could indicate bad data, as a group, heavy viewers are actually the highest converters in the data. They actually convert 7 times higher than the lowest hour viewing segment. (Figures 6 and 7). In fact the higher TV consumption group of course sees more impressions than the low TV consumption groups. If we normalize by impressions, on a per impression basis, the high TV consumption group and low consumption group actually have similar conversions per impression. This suggests that if impressions are displayed to humans, they tend to convert. In order to filter out the most extreme viewing behavior, but leaving in place the converters, we only filter out the most extreme heavy viewers who view more than 99% of the population.

5.4 Typical TV Viewing BehaviorSet Top Box data provides a new important window into typical TV viewing hours. Nielsen has reported over 5 hours per day on average (Nielsen, 2009). However this has always seemed large – how many people do you know watch 5 hours per day? We know that Set Top Box viewers tend to watch less, and have other pattern changes. In our data set we found that the average hours viewed was similar to Nielsen’s number – around 4.17 hours per day. However the peak of the distribution - the mode - was only 2.5 hours. Thus the most typical viewing amount is actually 2.5 hours per day with the distribution centered around this and right-skewed, which seems closer to what we might intuitively expect (Figures 3 and 4).

5.5 Virtual personsOur data is captured at the Set Top Box device level. Multiple viewers can watch the same device. Figure 8 shows what appears to be one person. Figure 9 shows what appear to be three different people watching the same unit – one person who watches Fox and Friends in the Morning, another who watches female programming in the evening, and a third that watches children’s programming. At present we aggregate over the different viewers.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 6: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:6 B. Kitts et al.

0%

5%

10%

15%

20%

25%

30%

0 5 10 15 20 25 30

% o

f eve

nts

Seconds until next event

Figure 5. Tune events often occur in quick succession as viewers “surf” channels looking for content to view.

33%

14%

11%9%

8%7%

6%5%

4%3%

0%

5%

10%

15%

20%

25%

30%

35%

0

2

4

6

8

10

12

14

16

18

20

Decile 1 Decile 2 Decile 3 Decile 4 Decile 5 Decile 6 Decile 7 Decile 8 Decile 9 Decile 10

% o

f im

pres

sions

Hour

s per

day

Households

Figure 6. Distribution of viewing hours by % of population. About 10% of the population views about 17 hours per day. This group consumes 35% of the impressions.

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

11.1 hpd 7.6 hpd 6 hpd 5 hpd 4.2 hpd 3.6 hpd 3.1 hpd 2.6 hpd 2.1 hpd 1.5 hpd 0.7 hpd

Dcle 1 Dcle 2 Dcle 3 Dcle 4 Dcle 5 Dcle 6 Dcle 7 Dcle 8 Dcle 9 Dcle 10 Dcle 11

Conv

ersio

ns p

er ca

pita

(1.

0 =

low

est c

onve

rsio

n ra

te)

% of Households

Figure 7. Heaviest TV watching group also are responsible for the most sales.

NetW Person DateTime Mins Program

ESPN101955

893/10/12 3:00

PM 22College

Basketball

SCIFI101955

893/10/12 3:00

PM 7 Survivorman

SCIFI101955

893/10/12 3:30

PM 4 Survivorman

ESPN101955

893/10/12 3:30

PM 26College

Basketball

ESPN101955

893/10/12 4:00

PM 30College

Basketball

ESPN101955

893/10/12 5:30

PM 12College

Basketball

ESP2101955

893/10/12 5:30

PM 17 NASCAR Racing

ESP2101955

893/10/12 6:00

PM 30 NASCAR RacingESP2 101955 3/10/12 6:30 7 NASCAR Racing

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 7: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:7

89 PM

ESPN101955

893/10/12 6:30

PM 2College

Basketball

SCIFI101955

893/10/12 6:30

PM 21 Survivorman

ESPN101955

893/10/12 7:00

PM 3College

Basketball

ESP2101955

893/10/12 7:00

PM 22 NASCAR Racing

ESP2101955

893/10/12 7:30

PM 12 NASCAR Racing

NICK101955

893/10/12 7:30

PM 29 Victorious

ESPN101955

893/10/12 7:30

PM 18College

Basketball

NICK101955

893/10/12 8:00

PM 9 Big Time Movie

Figure 8. Example viewing record. The demographics for this viewer include “Male”, “owns SUV”, age=“44-45”, interest in spectator sports “motorcycle racing”, “football”, “baseball”, “basketball”.

Net Person DateTime M Program Day P

FNEW102747

3912/5/11 6:30

AM 2 FOX and Friends Mon1

FNEW102747

3912/5/11 7:00

AM30 FOX and Friends Mon

1

OXYG102747

3912/5/11 7:30

PM28 The Bad Girls Club Mon

2

OXYG102747

3912/5/11 8:00

PM24 The Bad Girls Club Mon

2

OXYG102747

3912/5/11 8:30

PM30 The Bad Girls Club Mon

2

OXYG102747

3912/5/11 9:30

PM 2Bad Girls Club: Season 8

Preview Mon2

FNEW102747

3912/6/11 7:00

AM29 FOX and Friends Tues

1

LIFE102747

3912/6/11 8:30

PM27 America's Supernanny Tues

2

LIFE102747

3912/6/11 9:00

PM30 America's Supernanny Tues

2

LIFE102747

3912/6/11 11:30

PM30 One Born Every Minute Tues

2

LIFE102747

3912/7/11 12:00

AM 1 One Born Every Minute Wed2

FNEW102747

3912/7/11 7:00

AM28 FOX and Friends Wed

1

FNEW102747

3912/7/11 9:30

PM25 Hannity Wed

1

FNEW102747

3912/8/11 7:00

AM21 FOX and Friends

Thurs

1

FNEW102747

3912/9/11 7:00

AM29 FOX and Friends Fri

1

AFAM102747

3912/9/11 7:30

PM 3 Santa Claus Is Comin' to Town Fri3

DSNY102747

3912/9/11 7:30

PM20

Beethoven's Christmas Adventure Fri

3

DSNY102747

3912/9/11 8:00

PM22

Beethoven's Christmas Adventure Fri

3

AFAM102747

3912/9/11 8:00

PM 4 The Santa Clause Fri3

Figure 9. Multiple viewers for the same set top box. The demographics of this viewer are “Female”, Second gender =“Male”, “Married”, “2 children”, “Education=Grad School”. We have hand-labeled column “P” to show what could be the viewing from three different individuals – Person 1 who likes to watch Fox news in the morning, Person 2 who watches young female programming, and Person 3 who watches kids programming.

Person Date

X-Men Origins:

Wolverine(ViewMinutes)

Pirates of the Caribbean:

At World's End(ViewMinutes)

10175056 9/10/11 7:00 PM 28 0

10175056 9/10/11 7:30 PM 17 13

10175056 9/10/11 8:00 PM 25 5

10175056 9/10/11 8:30 PM 25 5

10175056 9/10/11 9:00 PM 17 13

Figure 10. Alternating viewing between two programs.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 8: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:8 B. Kitts et al.

6. TELEVISION SCHEDULEOur Set Top Boxes record the physical channel number on the device and

time of viewing, but not much else. We therefore have to join to the Television Guide to identify the programs that were running during each of these times on each of these stations. There are on average 7,154 distinct named programs and 54,086 30 minute airing timeslots each day not including local broadcasts.

Lining up the schedules can be challenging. It is possible for local stations to have a variety of interruptions to the regular national schedule. In order to quantify these effects, we looked for all program names that were of the form “*News*at h*” where h is a number; for example “Fox 5 News at 10”, “Eyewitness News at 4”, and “CBS 2 News at 6PM”. We then compared that hour number to the airing hour. We found that 99.11% of programs aired at their scheduled time. The majority of the remainder were shifted by +1 hour (0.84%). A small number were +2 hours (0.04%). The majority of the shifts appeared to be due to sports programming that ran long. Sunday was the day that a schedule change was most likely to occur (6.73%) followed by Saturday (2.76%) and Friday (1.39%). Monday to Thursday had the lowest chance (0.43%).

Shift in Hours(observed hour minusprogram listed time)

Sum of hits

0 99.11%1 0.73%-23 0.11%-22 0.04%

Figure 11. 99.11% of programs of the form “news at x” where “x” is an hour, air at their scheduled time. The remainder are most likely shifted 1 hour after their scheduled start time.

The processing described above is critical for avoiding illusory view events. After joining the Schedule, we are able to create a set of Set Top Box Viewing Events as the following tuple

STBView = (DeviceID, StationEvent, Program, DateTime, TimeZone)

The DeviceID can be linked to the billing name and address for the person who subscribes to the Set Top Box through an Anonymization process described below. We next want to identify whether the Set Top Box viewer saw our ads.

7. AD OCCURRENCESAd occurrence data is recorded by several media tracking services

including SQAD and BVS. SQAD collects their data from stations, where-as BVS embeds digital watermarks into the ads which are detectible in the video stream as it is received by television sets around the country.

Based on SQAD data there are approximately 19 million TV ad airings per month in the United States, which span the entirety of national, local, broadcast and cable networks. For conversion tracking purposes, we filter the ad occurrence data to the one advertiser for whom we are tracking

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 9: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:9

conversions. We then have a tuple as below which indicates the times that the ad was aired.

AdOcc = (Station, Program, DateTime, TimeZone)

8. CONVERSIONSWhen the customer purchases a product, the Advertiser obtains

information about the person’s name and address as part of the credit card transaction. This information can be used to create conversion data for modeling the impact of TV advertising.

Advertiser conversion data can include purchase value information, segments assigned by the Advertiser, acquisition and churn date. The most important data is actually the acquisition date, as we will be summing the ad exposures that occurred prior to the conversion. Conversion data contains the following salient fields:

Conversion = (CustomerName, Address, DateTime, TimeZone)

9. JOINING IT ALL TOGETHERThe objective is to join each of the four data sources together and then to

start counting ad exposures on each viewer and identify the probability of conversion as a function of weight and targeting. Figure 12 shows the join. This process works by joining Customer Name and Address to Set Top Box viewer, but does so in a way that only results in anonymous records. We describe how this is done in the next section.

Figure 12. Major data sources used in STB conversion tracking

10. AD CONVERSION TRACKINGWe now have a set of anonymous persons, the ads they viewed, the

programs that they watched, and the date (if any) when that person converted with the advertiser. We can now analyze the effect of advertising on sales.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 10: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:10 B. Kitts et al.

10.1 Ad WeightWe will define Ad Weight A as 1000 * the number of ads viewed by an

individual

A ( i ,α , τ )= 1000 ⋅ imp (i , α )weeks ( i ) ⋅TVHH (i)

where α are a set of ads for an advertiser and imp(i,a)=1 if person i viewed media instance a. Ad Weight is directly analogous to a Gross Rating Point (GRP), but we use the more generic term “ad weight” because our calculation of ad exposure differs from the calculations usually used for GRPs.

The match to determine if the viewer saw the ad a is the most expensive part of the above join. It is possible to join where the media instance are one second viewing buckets – and so a match is only achieved if the ad airs and the user views the media at the same second.

We have found that we can speed up the match (and lose some fidelity) by increasing the size of the matching bucket. In addition, there are often some clock differences between MSOs and Set top box hardware units, and so using a bucket also helps to improve the reliability of the match. We can define a matching time bucket of size B in seconds. We truncate the time stamp for the ad occurrences as well as for the viewing records with the same bucket. We then define a match as

If viewminutes(i,m) ≥ B * b then v(i,m)=true

where b is a percent of the bucket that needs to be viewed. In the experiments that follow, our bucket size was equal to 15 minutes with b=0.7. This style of matching is not as accurate, and may incorrectly return an ad view when the user had actually channel-switched. Therefore, the AdWeight is an upper bound on the true AdWeight. However it creates a 900x speedup which is valuable in practice. After using this technique, the adweight calculated for the landscape needs to be reduced as follows:

A(i,α)'=A(i,α)∙b

10.2 TargetingTargeting is challenging on TV because ads cannot be routed to individuals.

Instead, ads need to be placed on specific programs, rotations, times of day. In order to do this, a definition of targeting is needed which is compatible with TV systems.

Some authors have discussed targeting as “advertiser ratings” (Duggan, 2012) – ie. percentage of viewers who are like the converting customer. We believe that this is generally a good definition – our definition of targetedness is equal to “probability of buyer”. We provide two targetedness metrics: (a) Direct Targeting and (b) Demographic Targeting.

2.1.1. Direct Targeting. Direct Targeting looks at what known buyers (converters) of the product are watching, and then creates a “probability of buyer in audience” for each media instance. The method calculates the probability of buyer given the TV programming mix that the individual who is

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 11: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:11

being scored is watching. In other words, look at all programs viewed, sum up the buyers in that pool and divide by the viewers For example, if there were 10 buyers out of 100 on Program A, and 1 out of 100 on Program B, and our individual viewed only Program A and B, then their buyer probability is 5.5%.

2.1.1. Demographic Targeting. Direct Buyer probability calculation can run into difficulty when there are few conversions. Another method is to use the demographics of media to calculate the probability of a conversion from this media. Using this method, we decompose each individual Set Top Box viewer into a 400 element demographic variable-value vector I. We then compare their demographics to the demographics of purchasers of the advertiser’s product P. This method has the advantage that it will work across all possible TV programs, regardless of the potential sparsity of buyers in the population.

r (i , α , τ )= P∙ I‖P‖∙‖I‖

11. CONVERSION PROBABILITY CURVES FROM STB DATAWe used the method at a large online music provider. The company provided 794,988 persons who had purchased their music product.

We joined these converting customers to our STB Viewers using the Three-way Anonymization method to ensure that person identities from viewers and purchasers were not disclosed in any way.

Of those 794,988 persons, 506 were detected in our Set Top Box population. We used single-device households in order to maximize accuracy.Those 506 persons generated 674 ad exposure events. 28% of those occurred after exposure to ads, and so we focused on those persons who had been exposed and then converted. We counted all exposures and divided by the length of time that the persons had been observed. This was converted into a weight measure as impressions per thousand households per week, or for individuals, 1000 meant seeing a single ad each week, and lower values (eg. 200) meaning an ad every 5 weeks. This produced Figure 14.

0.00%

0.10%

0.20%

0.30%

0.40%

0.50%

0.60%

0.70%

0.80%

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Conv

ersio

n pr

obab

ility

for p

erso

n

tratio of person

-0.001

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

-10 0 10 20 30 40 50

Prob

abili

ty o

f per

son

conv

ertin

g

Number of exposures

convprob after ads (did this cohort convert?)

Figure 14: (top) conversion rate per adview for persons exposed to TV ads, (left) tratio versus conversion probability (any exposure), and (c) number of exposures versus the same. Both tratio and exposures are correlated with higher conversion probability.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 12: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:12 B. Kitts et al.

We did not create a model combining targeting and weight. Instead, targeting results were aggregated, and weight was aggregated, and both were modeled separately. This is a limitation of the research, but we still report these results to see how well they matched the experimental results next.

The weighted least squares curve fits from Set top Box data are as below:

E(Conv∨weight)=¿0.0031 * r ( i ,α , τ )+ 0.0049

E(Conv∨tratio)=¿(0.000248/1000) * A ( i ,α , τ ) + (0.00082/1000)

The average conversion rate is 0.005078 and so the percentage lift at average for a +1.0 shift in tratio is 0.0031 / 0.00508 = 61%. At 0 media weight, the conversion rate is 0.00192 and percentage increase per 1000 Imp/MHH/Wk is equal to 12.9%.

12. COMPARISON TO EXPERIMENTAL RESULTSWe next performed an independent experimental test to measure the same relationships, but using a matched market treatment-control experimental design. We created an in-market experimental design in which television media was run in 5 direct marketing areas, and not run throughout the remainder of the United States. We aired 9,687 television airings in these markets, at a cost of $389,949. The expense involved in running this experiment meant that only 5 markets could be used.The treatment areas were approximately 500,000 household sized areas and were chosen because of their close match to the US demographic composition. The control areas were chosen based on spatial closeness, demographic match and historical sales match to each treatment area.

The five areas are shown in figure 15 and 16. The design was as follows: Spokane, Albany, Dayton and Huntsville were all purchased at a tratio of approximately 0.13, and with successively higher weight levels. Cedar Rapids was purchased at the same weight level as Hunstville but at a tratio of 0.26.

As a result, we had 4 experimental cells at successively higher weight, along with national at 0 weight, to measure the impact of higher impressions.

We also had 2 experimental cells at a low and high level of targeting, with Huntsville doubling as both a low targeting cell and also as a media weight cell. Media was applied starting 9/18/2011 and halted on 10/31/2011.

Figure 15: Treatment and control areas

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 13: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:13

Market Airings

StaGross

HHImps tImps tRatio tCPM HHCPM

ALBANY,NY 1609 $111,886

8,992,195

1,067,265

0.1187

$104.83

$12.44

CEDAR RAPIDS,IA 2806 $107,496

3,820,144

899,428 0.2354

$119.52

$28.14

DAYTON,OH 828 $25,640 1,665,025

214,985 0.1291

$119.26

$15.40

HUNTSVILLE,AL 1473 $26,312 3,291,964

402,478 0.1223

$65.38 $7.99

SPOKANE,WA 2969 $112,615

7,356,192

921,283 0.1252

$122.24

$15.31

YAKIMA,WA 2 $0 6,572 299 0.0455

$0.00 $0.00

Grand Total 9687 $383,949

25,132,093

3,505,738

0.1395

$109.52

$15.28

Figure 16: TV impressions injected into each area

Figure: Control area selection for Spokane

geoarea1 geoarea2 avgdiff

census rank

greatcircledistance

dist rank

overall rank

SPOKANE,WA HELENA,MT 0.02 8 261.44 5 3.1SPOKANE,WA MEDFORD,OR 0.01 1 437.91 14 4.4SPOKANE,WA BILLINGS,MT 0.01 3 450.95 15 5.1SPOKANE,WA IDAHO FALLS,ID 0.02 10 364.33 12 5.6SPOKANE,WA CASPER,WY 0.01 2 564.68 20 6.4SPOKANE,WA EUGENE,OR 0.02 16 386.73 13 7.1SPOKANE,WA BUTTE,MT 0.03 34 275.08 7 8.9SPOKANE,WA CHICO-REDDING,CA 0.02 25 541.48 16 9.8SPOKANE,WA BEND,OR 0.03 38 303.08 9 10.3SPOKANE,WA RAPID CITY,SD 0.02 23 708.19 24 11.8SPOKANE,WA TWIN FALLS,ID 0.03 43 361.53 11 11.9

Figure: Controls for Spokane

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 14: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:14 B. Kitts et al.

Figure: Treatment and control market sales per day over a full year prior to the start of the experiment, with 1.0 being the average sales per day prior to experiment. Both the treatment and control match each other in terms of % movement over their average. Starting 9/18/2011, the treatment areas diverge dramatically, lifting to over twice the average sales per day of the control areas. The close match prior to the start of the experiment provides a strong validation that the treatment and control areas are well-matched for each other, and that factors are consistent between these areas. The only difference in the control area is the application of TV starting 9/18/2011.

0

0.16

0.37

0.78

1.28

0

0.2

0.4

0.6

0.8

1

1.2

1.4

lift v

ersu

s con

trol

0.37

0.74

-

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

lift v

ersu

s con

trol

Figure 17: (left) Weight cells versus lift, and (right) targeting cells versus lift

The experimentally measured lifts (Figure 17) give us the following basic results for tratio and weight around the mean tratio and 0 media weight case respectively:

E(Conv|tratio) = 2.8 * r (i ,α , τ )+ 0.0074

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 15: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:15

E(Conv | weight) = 0.00036 * A ( i ,α , τ ) - 0.03883

Which equals 280% per +1 shift on tratio, and 36.67% per 1000 Imp/MHH/Wk for weight. These are larger numbers than what we inferred from our Set Top Box data. However, we did not apply our correction factor b=0.5 and so applying that brings them closer. A summary comparison is shown in Figure 18. In summary, the Set Top Box methods are under-counting experimentally measured lift. TV advertising is known to produce a wide range of effects including delayed conversions. It is possible that the difference might be accounted for by these delayed conversions filtering into later times. Further work will be needed to see if we can refine the Set Top Box methods to produce a closer estimate.

% Increase in ConversionsChange in Measure STB STB

adj.Exp. Design

+1000 Imp/Mhh/Wk

12.9% 25.8% 36.67%

+1.0 tratio 61.0% 122.0% 280%

Figure 18: Comparison of lift estimates using different methods

13. DISCUSSIONMany authors have proposed that in television, impressions should be considered the currency for advertising. Irwin Gotlieb CEO of GroupM which is the largest advertising agency in the world, criticized the proliferation of different metrics that are becoming available online to measure consumer behavior, and has called for the industry to adopt standardized metrics such as the GRP as the general advertising currency (Gotlieb, 2012). Steve Hasker from Nielsen says “they’ve created a currency” (Goetzl, 2012; Nielsen 2012). Charles Gabriel, VP of Sales at AOL said “Nielsen is the de facto standard in terms of measurement and audience delivery.... it makes sense to have them create that common currency across both mediums.” (Rodgers and Kaplan, 2012). David Poltrack, Chief Research Officer with CBS says that Nielsen will “remain the currency for the near future” (Worden, 2011). Rentrak recently reported that “[Rentrak provide...] television ratings currency” (Rentrak, 2012)

The use of the term “currency” is suggestive. Currencies as defined in classic economic theory, are mediums of exchange, and modern currencies have a variety of features including anti-counterfeiting devices, to ensure that the face value remains undebased (Bernstein, 2008). If impressions really are a currency, then Nielsen’s pseudo-monopoly in panel-based audience measurement (Worden, 2011) may put it into the position of owning the “printing presses” for this new currency.

However the discussion about currency ignores some key facts. In order to fit the classic economic definition of being a currency, impressions need to meet several criteria (Bernstein, 2008): (a) Portability, (b) Durability, (c) Divisibility, (d) Countability, (e) Fungibility, (f) Reliable transmission of value.

Fungibility and value are where we get into trouble. Fungibility means that different pieces need to carry the same value. For example, in the real world diamonds are not a good currency because their value can be subjective.

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 16: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:16 B. Kitts et al.

Unfortunately not all impressions are created equal. We have shown in this paper that we could take the same impression, and if we put a well-targeted ad in front of it, we can increase the lift for an advertiser of buying that impression by 10x or 20x.

Thus it seems that impressions simply do not faithfully store or represent value. Treating impressions like a currency is like a central bank printing paper money without any denominations on their paper bills – blank bills. Attempting to use this kind of currency would result in inefficient exchange since each bill is like a lottery ticket.

Impressions may better fit a different economic concept – that of being “Goods” (Krugman and Wells, 2006). “Goods” are assets that can be purchased by different entities, and may be valued differently by those different entities.

Goods still need to meet certain minimum standards, and likewise the industry needs minimum standards for an impression to have the possibility of conveying value. Impressions should not be fraudulent – that is they were delivered, rendered, able-to-be-viewed by a person, not accidentally hidden outside of the viewing screen, and so on.

The Media Rating Council (MRC) was formed to ensure that companies meet these minimum standards. For example the 2009 IAB/MRC Click Measurement standards (IAB, 2009; Microsoft, 2009) were developed in response to Click Fraud (Kitts, et. al., 2006), to find the threshold at which clicks have no value and ensure that advertisers are not paying for those. In this role the MRC is kind of like the US Food and Drug Administration (FDA) – ensuring that food being sold for consumption by advertisers is at least not poisonous (poisonous is an apt analogy as click fraud could otherwise literally send an advertiser bankrupt). Therefore advertisers are assured some safety, even though they have no guarantee around the value of the product.

It is also true that publishers may elect to decline to sell some assets which, while not being fraudulent, are substantially lower value than their own voluntary market standards. In online advertising, Smartpricing (Adwords, 2012) is used to dynamically adjust the value of assets to ensure that they meet a particular internal standard of value.

However beyond this, each impression is a good for sale, and advertisers need to perform valuation in order to determine its value.

Thus, the critical task for each advertiser is to be able to value the collections of impressions that are being offered for sale by publishers (programs, rotations, etc) and determine an offering price for each that is lower than or equal to the value.

The lack of availability of technologies for conversion tracking with TV, has led to a situation in which television advertising exhibits terrible relevance.

Without the ability to know whether Fringe or Football is better for buyers, Advertisers have proliferated large numbers of ads without targeting.

The impact of solving this problem should be significant. In online systems the presence of conversion tracking creates a virtuous cycle in which poor ads are removed and better ads promoted. This has not only resulted in Google generating 37.9 billion dollars per year (Google, 2012), but Jensen and Resnick (2006) reported that online for commercial searches, there is no statistically significant difference between the relevance of paid listings and organic. This is remarkable when one considers that just letting advertisers track conversions and bid accordingly produces a system as good as the culmination of 50 years of information retrieval research. Improved tracking

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 17: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:17

and targeting could arguably lead to a similar impact in TV - a reduction in the number of ads being shown, an increase in their relevance, an improvement in consumer satisfaction, higher prices for publishers, and more satisfaction from users about their TV networks since there are fewer and shorter breaks, and the breaks are relevant.

14. CONCLUSIONWe have described efforts towards measuring conversions generated from television. The key approach is to join Set Top Box viewers with Advertiser conversions data, and then to measure the ad exposure of those individuals prior to the conversion event. This strategy is similar to view conversions in e-commerce. We showed the method on a large advertiser and compared to a matched market television results costing nearly $400,000. While the present paper is not a finalized result, we believe that the results are suggestive that the approach may have merit. We believe that the foundation developed in this article will help other researchers and the field in general to further refine these techniques towards a conversion tracking capability for television using set top box viewing data.ELECTRONIC APPENDIXThe electronic appendix for this article can be accessed in the ACM Digital Library.ACKNOWLEDGMENTS

REFERENCESAdWords (2012), What is Smartpricing?, Adwords Help, accessed June 30, 2012,

http://support.google.com/adwords/bin/answer.py?hl=en&answer=2604607Anders, C. (2010), How the Nielsen TV ratings work — and what could replace them,

http://io9.com/5636210/how-the-nielsen-tv-ratings-work--and-what-could-replace-themAngrist, J. and Pischke, J. (2010), Mostly Harmless Econometrics, Princeton University Press.Berendt, B., Mobasher, B., Spiliopoulou, M., Wiltshire, J. (2006), Measuring the Accuracy of

Sessionizers for Web Usage Analysis, Technical Report TR01-006, Humboldt-University Berlin.Bernstein, P. (2008), A Primer on Money, Banking and Gold (3rd ed.). Hoboken, NJ: Wiley.Chandler-Pepelnjak, J. and Song, Y. (2012), Optimal Frequency: The Impact of Frequency on

Conversion Rates, Atlas Institute Digital Marketing Insights, http://atlassolutions.com/wwdocs/user/atlassolutions/en-us/insights/OptFrequency.pdf

Duggan, B. (2012), Industry Interest in Brand-Specific Commercial Ratings, Association of National Advertisers, Blog, June 1, 2012, http://www.ana.net/blogs/show/id/23617

Department of Homeland Security, Privacy Policy Guidance Memorandum, European Union Data Protection Directive, Directive 95/46/ECFTC (2000a), FTC 2000 Privacy Report http://www.ftc.gov/reports/privacy2000/privacy2000.pdfFTC (2000b), FTC Fair Information Practice Principles, http://www.ftc.gov/reports

/privacy3/fairinfo.shtmFTC (2008), Federal Trade Commission, Privacy Online: A Report to Congress,

http://www.ftc.gov/reports/privacy3/toc.shtmlGellman, R. (2008), Fair Information Practices: A Brief History, http://bobgellman.com/rg-docs/rg-

FIPshistory.pdfGoetzl, D. (2012), Nielsen To Analysts: We've Created A 'Currency,' Says GroupM WIll Use OCR To

Guarantee Ad Buys, Online Media Daily, May 18, 2012, http://www.mediapost.com/publications/article/175094/nielsen-to-analysts-weve-created-a-currency-s.html#ixzz1zJgfUjFZ

Google, (2012), SEC Filings.Gotlieb, I. (2012), Audience Measurement 7 conference, The Advertising Research Foundation.IAB (2009), IAB Click Measurement Guidelines Version 1.0 Final Release, May 12 2009.

http://www.iab.net/media/file/click-measurement-guidelines2009.pdf

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 18: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:18 B. Kitts et al.

IAB (2012), Platform Status Report: Interactive Television Advertising, http://www.iab.net/media/file/ITV_Platform_Status_Report.pdf

Jansen, B. J. and Resnick, M. 2006. An examination of searcher's perceptions of non-sponsored and sponsored links during ecommerce Web searching. Journal of the American Society for Information Science and Technology. 57(14), 1949-1961.

Jansen, B. (2007), The Comparative Effectiveness of Sponsored and Non-Sponsored Links for Web Ecommerce Queries, ACM Transactions on the Web, Vol. 1, No. 1.

Kitts, B., LeBlanc, B., Meech, R., and Laxminarayan, P. (2006), Click Fraud, Bulletin of the American Society for Information Science and Technology, Vol. 32, No. 2. Wiley, pp. 23-24.

Kitts, B., Wei, L., Au, D., Zlomek, S., Brooks, R., Burdick, B. (2010) “Targeting Television Audiences using Demographic Similarity”, Applications of Data Mining and Modeling in Government and Industry Workshop (ADMMGI2010) in Workshop Proceedings of the Tenth IEEE International Conference on Data Mining (ICDM), IEEE Computer Society Press.

Kokernak, M. (2010), What's Television's Next Business Model? Media Post Daily News, Wednesday, March 17, 2010 http://www.mediapost.com/publications/?fa=Articles.showArticle&art_aid=124424

Krebs, B. (2012), Massive Credit Card Breach of Estimated 10 Million Accounts, Forbes, March 31, 2012. http://www.forbes.com/sites/anthonykosner/2012/03/31/massive-credit-card-breach-of-estimated-10-million-accounts-where-are-those-smart-cards/

Krugman, P. & Wells, R., (2006), Economics, Worth Publishers, New YorkLambert, D. and Pregibon, D. (2008), Online effects of Offline Ads, Proceedings of the Second

International Workshop on Dataa Mining and Audience Intelligence for Advertising, ACM Press. NY.

Loftin, J. (2012), Utah Breach Affects 25,000 Social Security Numbers, MSNBC, April 9, 2012.Manadhata, P. and Wing, J. (2004), Measuring a System’s Attack Surface, Technical Report CMU-

CS-04-102, School of Computer Science, Carnegie Mellon University http://www.cs.cmu.edu/~wing/publications/tr04-102.pdf

Mcclellan, S. (2008), New Clients Embrace DRTV as Sales Soar, AdWeek, August 25, 2008 http://www.adweek.com/news/television/new-clients-embrace-drtv-sales-soar-96745

McDonough, P. (2012), The Evolution of The Video Consumer, Audience Measurement 7, The Audience Research Foundation, Nielsen Corporation.

Microsoft, (2009), adCenter Click Measurement: Description of Methodology (DOM), http://advertising.microsoft.com/small-business/product-help/adcenter/topic?query=moonshot_conc_clickmeasurementdom.htm

MRC (2012), MRC Multi-Channel Digital Video Data Capture, Accumulation and Processing Guidelines, Media Ratings Council, June 1, 2012

Nelson-Field, K., Riebe, E. and Sharp, B. (2012), What’s not to Like? Can a Facebook Fan Base Give a Brand the Advertising Reach it Needs? Journal of Advertising Research, Vol. 52, No. 2. pp. 262-269

Nielsen Corporation (2009), Three Screen Report: Television, Internet and Mobile Usage in the US, Vol. 7, Fourth Quarter 2009, http://blog.nielsen.com/nielsenwire/wp-content/uploads/2010/03/3Screens_4Q09_US_rpt.pdf

Nielsen Corporation (2012), Nielsen Corporation New Zealand Website. http://www.nielsenmedia.co.nz/company_info.asp

OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, http://www.oecd.org/document/18/0.3343,en_2649_34255_1815186_1_1_1_1,00.html)

Popkin, H. (2012), LinkedIn confirms password leak, eHarmony has one too, MSNBC, June 6, 2012, http://www.technolog.msnbc.msn.com/technology/technolog/linkedin-confirms-password-leak-eharmony-has-one-too-816238

Rentrak (2012a), Rentrak TV Essentials Product Sheet, http://www.rentrak.com/downloads/12-0716_TVE-FactSheet_01.pdf

Rentrak (2012b), Rentrak Signs TV Station Ratings Agreement with Morris Network Inc. Rentrak Press Release, June 4, 2012, http://investor.rentrak.com/releasedetail.cfm?ReleaseID=679459

Rodgers, Z. and Kaplan, D. (2012), In Grab For Brand Dollars, AOL Baits Hook With Nielsen’s Online GRP, AdExchanger, April 18, 2012, http://www.adexchanger.com/online-advertising/in-grab-for-brand-dollars-aol-baits-hook-with-nielsens-online-grp/

Saita, A. (2012), UNC-Charlotte Data Breaches Expose 350,000 Social Security Numbers and Much More, Kapersky Lab Security News Service, May 20, 2012. http://threatpost.com/en_us/blogs/unc-charlotte-data-breaches-expose-350000-social-security-numbers-and-much-more-051012

Schneider, M. (2009), Fox wants answers from Nielsen, http://www.variety.com/article/VR1118003924?refCatId=14 , Variety, May 18, 2009

Segal, A. (2007), Nielsen Ratings: An Inaccurate Truth Out of date television ratings system exposed, http://cornellsun.com/node/23180, The Cornell Daily Sun, April 26, 2007

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 19: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:19

Shade, L. (2008), Reconsidering the right to privacy in Canada, Bulletin of Science, Technology

and Society, Vol. 28, No. 1., pp. 80-91.Tellis, G., Chandy, R., MacInnis, D., Thaivanich, P. (2005), “Modeling the Microeffects of Television

Advertising: Which Ad Works, When, Where, for How Long, and Why?”, Marketing Science 24(3), pp. 351–366, INFORMS. http://www-rcf.usc.edu/~tellis/AdMicro.pdf

Thompson, D. (2012), Names, Social Security Numbers Exposed in UNF Data Breach, Jacksonville Business Journal, June 11, 2012.

Worden, N. (2011), Nielsen's Post-IPO Challenge: Preserving Ratings Monopoly, Wall Street Journal, Jan 25, 2011, http://online.wsj.com/article/SB10001424052748704698004576104103397970050.html

Received February 2017

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 20: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:20 B. Kitts et al.

Online Appendix to Towards a Conversion Tracking system for Television that uses Set Top Box data and Conversion timestamps

BRENDAN KITTS, PRECISIONDEMAND.

A. PII ANONYMIZATION METHODS USED IN CURRENT WORKPrivacy is important for practical and philosophical reasons (Shade, 2008).

In the United States, privacy is regulated by a patchwork of laws including the Children’s Online Privacy Protection Act (COPPA), the Gramm-Leach-Bliley Act, and the Health Insurance Portability Act (HIPPA) and PCIDSS. Many authors have expressed concern over the lack of regulation in this area in the United States and have called for legislation.

In the past several years there have been a deluge of information disclosures:

(i) In March 2012, Krebs (2012) reported that Global Payments Inc., a subcontractor for Mastercard and VISA, claimed to have compromised 10 million accounts. (ii) In April 2012, Loftin (2012) reported that 182,000 persons had their information stolen from the Utah Department of Health servers. (iii) In May 2012 Saita (2012) reported that 350,000 social security numbers were released to the public by University of North Carolina due to incorrectly configured internet settings. (iv) In June 2012 Thompson (2012) reported that 23,246 names were disclosed by the University of North Florida.

Although it is currently a set of voluntary guidelines, the FTC’s Fair Information Practices (2000) represent widely accepted industry best practice for privacy that are applicable across domains. These standards emerged after over a decade of hearings, and represent the most likely candidate for US legislation.

The FTC Fair Information Practice proposes four principles: (a) Notice, (b) Consent, (c) Access and (d) Security. Notice ensures that consumers are notified of information practices prior to collection including the types of data that will be collected, and the uses to which data will be put. Consent means that consumers should provide their consent for data collection to take place, typically through accepting End User Licensing Agreements. Access refers to the ability for consumers to view data collected about them and contest its accuracy. Security refers to physical and procedural controls to ensure the data is protected.

The Set top box data and advertiser data that we use complies with these guidelines. However we believe that we can go further.

More than just ensuring consent, notice, access and security, we specifically want to limit the dissemination of information, so that it cannot possibly be used for direct marketing.

We call the method “Triple Anonymization”. The method makes use of an Anonymization service which already houses Personally Identifiable Information. We use these to act as a “clean-room” so that data can be re-keyed stripped of Personally Identifiable Information (PII) such as name and address, yet without revealing any usable information. The method involves several steps.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY

Page 21: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

Television Conversion Tracking using Anonymized Set Top Box Data 39:21

14.1 Conversion File Processing

2.1.1. Advertiser. Advertisers have a list of Converters. The Advertiser sends two files (i) a file containing their internal customerID, date of conversion, and information about what they bought. This is sent to the STB Targeting Company directly. This file does not include any PII. (Figure 19) (ii) Another file with internal CustomerID and Personally identifiable information (name and address), but no behavioral/purchase/conversion information is then sent to the Anonymizer service. (Figure 19)

2.1.1. Anonymizer. The Anonymizer receives the file with internal CustomerID, PII and Token. It then searches for any demographic information about the PII and then adds that to an output record. It passes on the CustomerID, Demographics, AnonymousID, Token to the Set Top Box Targeting company. No PII is sent to the Set Top Box Targeting company. (Figure 19).

2.1.1. STB Targeter. The STB Targeting Company receives two files (1) CustomerID and Purchase information from the Advertiser directly, and (2) CustomerID, AnonymousID, Token, and Demographic Information from the Anonymizer. Both files reach it and it joins them together on CustomerID to create a full view of the persons who are converting. The STB Targeting Company has detailed converter transaction information, but does not know the identities of these people as no PII was sent to it.

Figure 13. Advertiser anonymization process showing the two files that the advertiser sends to the STB Targeter.

14.2 STB File ProcessingAnonymization of the STB file follows the same procedure, except that instead of an Advertiser, the STB Collector is the client, and instead of sending a Conversion file, STB View transactions are being provided.

Figure 20 shows the three entities involved in the Triple Anonymization scheme, and the information that they possess. None of the companies can associate identity and behavior.

Entity PII Conversion Behavior

STB Behavior

Anonymizer Y N NSTB Collector Y N YAdvertiser Y Y NSTB Targeter N Y Y

ACM Transactions on xxxxxxxx, Vol. xx, No. xx, Article xx, Publication date: Month YYYY

Page 22: A Multifrequency MAC Specially Designed for …€¦ · Web viewUsage data is generally collected at Head-end (MSO), Front End Servers (Satellite) and Internet Service Provider receiver

39:22 B. Kitts et al.

Figure 14. Information known by each of the four entites in the Anonymization process.

14.3 SummaryAs the breaches discussed at the beginning of this section show, the fewer people that have access to PII, the better for everyone (Manadhata and Wing, 2004). Many of the breaches discussed including the Global Payments breach of 10 million names, are breaches that we believe could have been eliminated by a PII-stripping architecture. Simply stated, Global Payments shouldn’t have received the PII information in the first place – they should have been working with anonymous IDs. The present architecture keeps PII housed with the large primary organizations who have a direct relationship with the consumer, and who’s reputuation and continued viability with their customers depends on them protecting that data. It ensures that there is accountability, and a reduction in attack surface, and centralization into better protected security architectures.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article x, Publication date: Month YYYY