Transcript

MED7126 Data and Multimedia Journalism Paul Bradshaw

Getting the data Advanced search tips

Don’t ask for what you want: describe what you expect to find

Search operators

What text will it contain? Where will that text be? What text will it not contain?

Imagine the data: text

Specific references, not general:

Specify a constituency… …a school

…an institution code …an invoice number …a piece of jargon

quotes: “disclosure log” asterisk “between * and 2014”

minus “hate crime” -religion -"publication scheme"

Number ranges: 2000..2014

‘life expectancy Birmingham’

"life expectancy" "perry barr"

inurl:

inurl:foi inurl:ccg

inurl:intranet inurl:search.asp inurl:search.php

intitle: allintitle:

intitle:foi allintitle:disclosure log

intitle:“bank fines”

intext: allintext:

intext:“miserable failure” allintext:miserable failure

"life expectancy" "perry barr"

"life expectancy" "perry barr" filetype:xls

"life expectancy" "perry barr" filetype:xls site:ons.gov.uk

"life expectancy" "perry barr" filetype:xls site:ons.gov.uk 2009..2014

"life expectancy" "perry barr" filetype:xls site:ons.gov.uk 2009..2014 -winter

Where is it likely to be What format? When was it not published?

Imagine the data: meta data

site:

site:gov.uk site:nhs.uk

site:police.uk site:ac.uk site:org.uk

site:org site:birmingham.gov.uk site:met.police.uk/foi/

disclosure

filetype:

filetype:xls filetype:xlsx filetype:pdf filetype:csv filetype:ppt filetype:doc

filetype:docx filetype:xml

search tools

“disclosure log” site:gov.uk allintitle:hate crime report filetype:pdf site:police.uk art inurl:search.asp -library

Combine operators:

research.google.com

zanran.com

Some sites use the robots.txt protocol to tell search engines not to index Use DownThemAll to download the site and search it locally

Sites that aren’t indexed

§

Do it now: Search for a piece of jargon in your field, on a particular type of site Search for spreadsheets or PDFs mentioning an individual in your field

§

Links: Pinboard.in/u:paulbradshaw/t:data+searchengine