11
DR INVESTIGATIVE DATA TEAM Contribution from DR DATA TEAM to the Data Journalism Award 2014 / Data journalism portfolio (team/newsroom)

Contribution from DR Data Team

Embed Size (px)

DESCRIPTION

The Data Journalism Award 2014 Data Journalism Portfolio (team/newsroom)

Citation preview

Page 1: Contribution from DR Data Team

DR INVESTIGATIVE DATA TEAMContribution from DR DATA TEAM to the Data Journalism Award 2014 / Data journalism portfolio (team/newsroom)

Page 2: Contribution from DR Data Team

In this portfolio you will find a selection of the stories and illustrations we have published since we first went “on air” November 1st 2013.

For a full overview of our production

Please visit this webpage: http://www.pinterest.com/katrinefrich/drs-unders%C3%B8gende-databaseredaktion/

All articles are in Danish.

The DR Investigative Datateam was launched October 1st 2013. We are one editor, two jour-nalists, one graphics designer and one programmer. We do all parts of Data Journalism our self. From scraping data of the web and using freedom of information act to dig out data and docu-ments from public administration to selecting, sorting, refining, filtering and analyzing data – to the final visual and editorial presentation.

We want data to have relevance to our readers. If we use lots of resources on interactive graphics, we want our readers to find something useful when they click on the devise.We only do the story if

we find news. No news, we reject the data. PortfolioWe live by two mottos when we select data for stories.

1 2

Katrine Birkedal Frich

Editor

Mads Rafte Hein

Graphics designer

Kresten Morten Munksgaard

Datajournalist

Bo Elkjær (Skipper)

Datajournalist

Jens Lykke Brandt

Programmer

Page 3: Contribution from DR Data Team

Tax-authorities misevaluated property – with benefits for rich people and disadvantage for poor people

One of our very first stories was based on complicated calculations gone wrong in the department of

Tax in Denmark. For a while lots of house-owners in Denmark were complaining that the valuations

made by tax-authorities (which was base for the property-tax paid by house owners) was out of touch

with the market valuations. When houses were sold the price of the house was far from the valuation

made by Tax-authorities. The consequences were that house-owners paid too much (or too little) prop-

erty tax.

This information had been well reported by Danish media. But the Database Team was wondering: Did

this misjudgment from authorities affect house owners equally? We decided to investigate that ques-

tion by filing a freedom of information act to the National Audit Office which was investigating the

scandal in Tax. We got the datafile with 12.000 rows with information about each of the houses sold in

second half of 2011.

With this data we analyzed the price of the houses by sorting them in to 5 categories (from low price

to high price). Within each of these categories we were able to analyze how many houses were sold

below or above the price estimated by Tax. Our analyz’s showed that most of the lower priced houses

(where people with low income live) were valuated far too high – leading to the potential consequence

that these people are paying to much tax, while houses that sold high on the market where valuated

too low by Tax – leading to the potential consequences that rich people paid too little property tax. An

inverted “Robin Hood” as one of our sources called it.

Besides telling this story in words, we told it in an interactive graphics where our readers were able to

select their own local municipality and see how Tax valuated houses in their own area.

http://www.dr.dk/Nyheder/Politik/2013/10/16/202745.htm

http://www.dr.dk/Nyheder/Politik/2013/10/16/212744.htm

http://www.dr.dk/Nyheder/Politik/2013/10/16/213712.htm

http://www.dr.dk/Nyheder/Politik/2013/10/16/204230.htm

What data did we analyze?

1. Number of citizens split to municipalities

2. 12.000 rows with data of houses sold and valuated by tax

Data in Excel Rows Columns Total number of cells

Raw data from database 16.249 14 227.486

Selected and refined 12.202 14 170.828

National average Municipality average (examplified by Brøndby)

Published Oktober 2013 Time spent on this story: 6 daysReaders since publishing: 61.166

All of these data were sorted, refined, combined, calculated and analyzed by the team.

Page 4: Contribution from DR Data Team

How many leaders does your municipality need? Why does one municipality need eight bosses to lead 100 employees, when another municipality can

do it with four? That’s the question raised by our investigation in heads of public administration.

We chose to look in to the amount of leaders in the local administration of municipalities. In a time

where money are sparse and all public good is subject to reduction, we found it interesting to investi-

gative through data if the public leaders hald them self as accountable to reductions as they hold the

employees at public schools, public daycare etc.

We therefore chose different sets of data to compare the municipalities and made an interactive device

(for mobile as well as web) that made it easy for the readers to choose their own local municipality and

study not only the amount of leaders, but also to compare it with service level, salary of the leader and

amount of leaders compared to citizens and compared to employees (please go to the web to see the

full extent of the interactive graphics).

What data did we analyze?

1. Number of citizens split by municipalities

2. Number of people employed in municipalities

3. Level of service in the municipality (a marker calculated by Government appointed

Commission)

4. Number of leaders

All of these data were sorted, refined, combined, calculated and analyzed by the team.

Data in Excel Rows Columns Total number of cells

Raw data from database 556 98 54.448

Selected and refined 99 98 9.702

Links to article:

http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/200141.htm

http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/195829.htm

http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/175643.htm

http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/175643.htm

Published November 2013 Time spent on this story: 5 daysReaders since publishing: 79.790

Page 5: Contribution from DR Data Team

The soil is toxic

One of our first projects was to map and illustrate poisoned soil in Denmark. To the citizens of Denmark

it is not news that some of our soil is contaminated with waste and chemicals from – among other sourc-

es – old industry, dry cleaning and gas stations. But few of us really know the extent of the pollution.

And few know the exact places of poisoned soil in the landscape.

To the Data Team the challenge was to show the readers exactly where the poisoned soil is situated. Is

it near your house? Is it at the playground at the kindergarten? Or the forest where you walk your dog

every day? That was a few of the questions we wanted to answer not only by writing stories, but also

by showing it in a map, where the readers were able to click and study the polluted areas of the soil in

Denmark.

Even though a huge part of Denmark is classified as toxic, the subject has never been given much

awareness in media or politics. The public spending on cleaning of polluted soil is as a result very small

(55 million euro). The responsible administration has as a matter of fact claimed that with the given

budget it would take more than 50 years just to clean the soil which at the moment is considered nec-

essary to clean in order to keep water in the ground drinkable and people living near toxic areas from

getting ill.

What did we do?

We decided to create a map of the poisoned soil. But instead of overloading our readers with all infor-

mation at once, we dripped different layers of information in the map with days delay. We decided to

run three different layers of the map with a pile of different stories to accompany the map.

First map:

The first iteration of the map included all registered areas that are classified by the authorities as either

contaminated or ‘likely contaminated’. The areas are shared to the public in the form of shapefiles; a

mostly open file-format for storing geo-information.

The files were converted to KML, another file-format used by Google products, and imported into both

Published November 2013 Time spent on this story: 20 daysReaders since publishing: 258.878

a database for further data-analysis and into Google Fusion Tables for visualization. We had to write our

own program for the import into the database.

In the database we could do queries with other areas and points like positions of schools, daycare-cen-

ters etcetera.

In Google Fusion Tables we merged the areas with additional information for each area: We had gotten

extended information on the contaminated areas by the use of several requests to the authorities using

the Freedom of Information law.

Finally we added all the information in an interactive google-map, where users could zoom, pan and

click on areas to get the extra information. We made a big effort to make the map work on mobile de-

vices and altered several UI-elements to accomplish this.

The first map showed the data of 29.000 areas in Denmark, where the soil is polluted or under suspi-

cion of being polluted.

Second map:

Second iteration was focused on the most contaminated and expensive areas. Data was gathered from

multiple sources and enriched by several more requests to the authorities.

The extra data was put on the map as icons using the Google Map API. To find the center of each area

we had to construct a query in the database that could give us the exact point.

Each of these icons reveals detailed information of plans and costs in the past and future.

Third map:

The last iteration was adding Natura 2000 areas. Natura 2000 is a collection of several special nature

types that are designated as needing special conservation and protection.

A shapefile from the European Environment Agency holding all Natura 2000 areas in Europe was con-

verted to KML and parsed by another program we created to hold only the Danish areas: this shrank the

KML file from 1.800MB to 6MB. These were imported into both database and Google Fusion Tables.

Google Fusion Tables could then display the Natura 2000 areas on our map, but we wanted to do

more: In the database we constructed a query that returned all contaminated and “likely contaminat-

ed” areas that overlapped the Natura 2000 areas.

Expensive poison grounds Nature plots V1 & V2 Google map

We marked the 1.309 overlapping areas with an icon on the map for users to easily see the scope of the problem.

Links to articles:

http://www.dr.dk/nyheder/tema/jordforurening/forside.htm

Page 6: Contribution from DR Data Team

The soil is toxic

Page 7: Contribution from DR Data Team

Gambling without winning is not a puzzle – it’s a train ride

For once we broke with our motto that says if there is no news, we drop the data. In autumn 2013 the

Danish Broadcasting Corporation had a theme about gambling and lotto. We decided to make a small

story to the web that visualizes exactly how small the chance of winning lotto in Denmark really is.

Our graphics designer, Mads Rafte Hein, is also an artist. We used both of his excellent skills to draw

a beautiful animated cartoon for the web. The story he drew was about the chance of winning in lotto

being just as small as your chance of hitting a bucket with a coin standing along the trail while you pass

it in speed and trying to aim from a window of the train. The story was based on calculation from two

mathematic experts – and that was the data of the story ☺

Published November 2013 Time spent on this story: 20 daysReaders since publishing: 23.665

http://www.dr.dk/Nyheder/Indland/2013/11/27/145110.htm

Page 8: Contribution from DR Data Team

Corporate tax

The project was to map and illustrate the corporate tax paid by Danish companies in 2012. The project

was based on data released by the Danish tax authority SKAT. It is only the second time since 2012 that

SKAT has released data in full on corporate taxes.

The Data Team was met with several challenges in collecting, analyzing and presenting the data. Some

250.000 companies are listed as taxable. 57.000 companies actually paid tax. One percent of these

companies paid more than two thirds of the total corporate tax paid to the Danish exchequer. Out of

this percent only seven companies paid one third of the total corporate tax.

Finally, the corporate tax paid form 5.6 percent of the total tax paid to the Danish exchequer by all

taxpayers.

What did we do?

We decided to create a slideshow describing the corporate tax paid in 2012. This main story was to be

accompanied with other stories, describing the largest taxpayers, the companies that lost money in

2012, and the distribution of corporate tax to the Danish municipalities. Data was released from SKAT

the 5th of December 2012 and the stories were to be published on December 27 and 28.

Collection of data

The primary obstacle was the collection of data. Even though SKAT released data on all Danish compa-

nies’ tax payment in 2012, due to political reasons the release was severely amputated. Because of the

way the data was released you could only get access to information on the companies one at a time.

SKAT opened access to a database, where you could search the companies by name or by registration

number. Doing this you could get access to a page showing you the company name, the registration

number, type of corporation, applicable tax code, and corporate tax for the company, taxable income

and deductible deficit. Furthermore, if applicable, the page would contain information on taxed in-

come of oil extraction for the companies operating in the North Sea and also when applicable informa-

tion on companies under joint taxation.

Published December 2013 Time spent on this story: 20 daysReaders since publishing: 279.878

Since the data was released in the way it was, we needed to set up an automatic scraper that would

search SKAT’s database by company number and copy off the data one company at a time.

To do this we downloaded a full list of all company registration numbers from the Danish company reg-

istry cvr.dk. This list was fed into a program that was coded by the Data Teams programmer. The code

was set up to collect between 20 and 30 individual company records per second from SKAT’s data-

base. It took a few days to collect tax records on 243.000 taxable companies.

Since data on each company was very sparse we needed to combine the information collected from

SKAT with data from the Danish company registry cvr.dk in order to get information on addresses and

accompanying municipalities on each company.

Both the spreadsheet with the data from SKAT and the full list of company records from cvr.dk were per

se too large to handle in Microsoft Excel. So in order to combine the data we imported the data from

the scraper into OpenRefine and combined it with the full list of company records from cvr.dk.

After combining and cleaning up the data we were able to export it in files that could be imported into

Microsoft Excel and analyzed here.

Analyzing the data

According to the Danish tax code the municipalities each get 13,41 percent in proceeds of the paid

corporate tax. But parts of the proceeds are divided between the municipalities after a set of distribu-

tion keys dependent on distribution of employees, daughter companies, etc. Also the proceeds are

distributed in a three year delay – so that the proceeds each municipality receives in 2012 were actually

paid in tax in 2009. In total this means that even though we had data on locations in municipalities of

companies we were unable to directly compare the municipalities by proceeds based on the 2012 in-

formation from SKAT.

In order to get accurate data on municipality tax revenue we collected data on the actual reported reve-

nue from Statistics Denmark. These figures were divided up by public records on municipality population.

Presenting the data

The first batch of stories published December the 27th, was centered on the municipality revenues from

corporate tax. The main story was carried by a map showing the revenues per citizen, nationwide. Thus,

we could show the rich and the poor municipalities based on corporate tax revenue, showing the very

large differences nationwide in revenue. http://www.dr.dk/Nyheder/Penge/2013/12/23/195455.htm

We also published stories with lists over the wealthiest and poorest municipalities and accompanying

interviews, also describing how the wealthiest municipalities lost revenue in the national redistribution

which takes place each year according to the distribution keys briefly mentioned above.

The second batch of stories published December the 28th, was centered on the interactive graphics

we developed to show the distribution of corporate tax payments.

http://www.dr.dk/Nyheder/Penge/2013/12/27/151440.htm

Again, the main story was accompanied with other articles catching up on different aspects of the issue.

Company directors were interviewed and experts were interviewed who nuanced the information pre-

sented and put it in a national financial context.

Page 9: Contribution from DR Data Team

Corporate tax

A motion story about tax

The presentation utilizes the D3 framework for illustration and animation. The large data-set were

compiled into a dense format for transport to the client, where the information would be extracted

again in order to fit into the animation-model developed for this presentation.

The animation itself consists of roughly 1000 tiny boxes that are animated using randomized values for

delay and duration - and thus each session sports an unique animation. http://www.dr.dk/Nyheder/

Penge/2013/12/27/151440.htm

Page 10: Contribution from DR Data Team

Heritage-calculator CO2 Calculator

When the Danish Broadcasting Corporation (DR) launched their TV Drama “The Heirs” which aired on a Sun-

day evening with more than 1.723.000 viewers (which are one third of the total population in Denmark) the

department of news followed up with a set of stories about the difficulty a heritage can cause in a family.

When you travel by airplane it has a price in CO2. While companies try to offer a ticket to green conscience

giving the costumers a choice to pay a fee for the flight, almost nobody does. But the travel across the world

is still heavy on CO2. Our desk tried to show the “price” for the environment by creating a calculator that il-

lustrates how much CO2 home appliances would be able to pollute before it is comparable to a given flight.

Published January 2014Time spent on this story: 2 daysReaders since publishing: 64.432

Published February 2014Time spent on this story: 5 days Readers since publishing: 22.000

Links to article:

http://www.dr.dk/Nyheder/Indland/2014/01/31/170158.htm

http://www.dr.dk/Nyheder/Indland/2014/01/31/164547.htm

http://www.dr.dk/Nyheder/Indland/2014/01/31/163233.htm

http://www.dr.dk/Nyheder/Indland/2014/01/31/161618.htm

See stories here: http://www.dr.dk/Nyheder/Tema/arv/forside.htm The Database Team contributed

to the theme by creating a calculator where people could find out by themselves the amount of money

they are entitled to.

Page 11: Contribution from DR Data Team

EU citizens and the social benefits

In Denmark politicians have for a while fought about whether or not people from other EU-member

states who live and work in Denmark should have same access to Danish Social Benefits (such as unem-

ployment-pay, child benefits etc. which are for free if you pay your tax and live in Denmark). The debate

rely on the premise that people from member states such as Poland, Rumania and Lithuania come to

Denmark to exploit the social benefits rather than to live and work as “the rest of us”. So the Database

Team decided to investigate whether or not people from other EU states, living in Denmark, are exploit-

ing the welfare system or not.

We started by asking the tax authorities to provide data about how many citizens from EU states who

are receiving child benefits (an amount of approximately 170 euro each month per child). The result

was that yes, more people are receiving this benefit, – but the total amount of money that the Danish

State spent on this is less than one percent of the amount spent on child benefit on total to all parents

living in Denmark.

http://www.dr.dk/Nyheder/Politik/2014/02/27/095702.htm

But the debate didn’t stop by that acknowledgement. It went on to concern about whether or not some

people are exploding the social benefits such as unemployment payment and pensions etc. At the Da-

tabase Team we again asked the data if there should be any facts to support or dismis the accusations.

We got data from Danish Statistic which had to make a special request for us. After we got the data, we

sorted it and created an interactive map of European countries and made it possible for our readers to

click and see the share of citizens from different states use of social benefits in Denmark. The result of

our analyze: No other state have citizens living in Denmark exploding social benefits.

http://www.dr.dk/Nyheder/Indland/2014/03/05/170709.htm

http://www.dr.dk/Nyheder/Indland/2014/03/05/170709.htm

Published March 2014Time spent on this story: 3 daysReaders since publishing: 50.261