Upload
simon-price
View
51
Download
10
Embed Size (px)
Citation preview
Supporting Big Data, Open Data, Data Analytics and Data Science
Dr Simon PriceResearch IT Manager
2
• Bristol is a research-intensive university
• 6 Faculties: Social Science & Law, Science, Engineering, Arts and two Medical Faculties
• Employs 2000+ researchers (excluding PhDs)
• Each year (approximately):• 1500 research funding applications• £100M research income• 4500 research outputs
3
Outline
1. Big Data2. Open Data3. Data Analytics4. Data Science
5. Implications for IT support
4
Big Data
5
Big Data
• Lots and lots of technology buzzwords!• Some important ones:
• MapReduce• The Hadoop stack
• Distributed file systems• Query languages & programming languages
• NoSQL databases (columns, document, graph, ...)
6
MapReduce in a nutshell
Image source: https://developers.google.com/appengine/docs/python/dataprocessing/
7
Big Data
• Trends in Hadoop stack• Near realtime analytics• Streaming analytics• In-memory
• Trends in NoSQL• Relational and NoSQL moving closer together
8
Open Data
9
Open Data - data.bris• Each PI allocated 5TB "forever"• Research Data Management• Open Data Publication
10
Open Data - public data
11
140+ datasets live on opendata.bristol.gov.uk Some real time data Transport API repository now available Examples
Government: Elections since 2007 Community: Quality of Life survey Education: School Results Energy: Installed PV, Energy Use in Council Buildings Environment: Real time & Historic Air Quality, Flood Alerts (EA) Land use: 2013 Planning applications Health: Life expectancy/ Mortality, Obesity, NHS Spend
Bristol is Open - datasets
12
Data Analytics
• Operational focus• variables are "known knowns and known unknowns"
• Descriptive• summarisation known variables and alerting
• Predictive• correlations between known variables
13
Data Science
• Multidisciplinary data-intensive research• Focus on research insights, causation and prediction• Usually involves Machine Learning and Statistics
• Different perspectives:• Computer Scientists view DS as a research domain• Statisticians view DS as a research domain• Other academics view DS as a service
14
3 May 2023
15
3 May 2023
16
Implications for IT support
• Governance• Shift from IT-owned to academic-owned (Shadow IT)
• Skills• IT experts need to train and trust academics• Nurture internal skills pipeline (interns, postgrads)
• Systems• Mixed economy of internal and external