18
Data Science in BIS Current activity and prospects Siobhan Carey Chief Statistician March 25 2015

Data Science in BIS Current activity and prospects Siobhan Carey Chief Statistician March 25 2015

Embed Size (px)

Citation preview

Data Science in BIS

Current activity and prospects

Siobhan CareyChief Statistician March 25 2015

• What we are doing

• Some reflections

2

Where we started from

Data Science

Topic specialism

IT / analytics Stats/ inference

Key dimensions

• Infrastructure• Skills

– Data analytics

– Data visualisation

– Interpretation

• Data – Big or otherwise

4

IT / analytics Stats/ inference

Data Science

Topic specialism

Tools Platform

BISDATA BIG

DATA

Tools

6

Data visualisation /MappingD3, Leaflet, Mindjet,MapInfo, ArcGIS, QGIS, GIMP, Inkscape, Scribus…

Statistical AnalysisSAS, SPSS, Stata, R,Matlab, Vensim, X13 Arima, SQL …

Scripting/CodingActivePython, ActivePerl,Openrefine, Javascript, Gedit, Netbeans, Tortoise SVN, Scala

StuffOffice, Browsers

Deployed on a scalable virtual machine

Skill development

• Small central team• Training /exposure• New toolbox available to all (90 as of this week)• Projects

– Trade visualisation

– Companies House

7

8

Skills - interpretation

9

What are we hoping for?

– New insights from our existing data holding

– New ways of presenting data• Easier to absorb, easier to explore

– New data that can be harvested relevant to BIS policy decisions

• Cheaper, more timely? • Replace existing data?• Supplement existing data?

10

Data

Existing BIS data: •Large datasets in HE and FE•Data on businesses•Data linking

New Data sources•Big data – some potential projects•Personal data – legal and disclosure issues

11

REFLECTIONS

12

Current paradigmFarmer

•Cultivation•Theoretical base•Known properties

- Coverage- Non-response- Known and tested

variables•Skills well developed if in short supply•Infrastructure •Process•Security of supply of inputs

New paradigm?

Current paradigm – New paradigmFarmer

•Cultivation•Theoretical base•Known properties

- Coverage- Non-response- Known and tested

variables•Skills well developed if in short supply•Infrastructure •Process•Security of supply of inputs

• Lacks statistical properties• Unknown coverage/bias• Doesn’t necessarily have the

measures you want/need• Lack of standardisation• Needs alternative theoretical

framework• Growing on someone else’s

land?• Future supply uncertain

Forager

Thank you!

[email protected]

http://analysis.bis.gov.uk/trademap/

Images 123 RF

5050330 Pavel Mitrofanov,

5050330 loganban

6666493 Silvia Crisman

18