88
EDISON Data Science Framework (EDSF) Data Science Professional Education and Training EDISON Project value proposition and legacy Yuri Demchenko, EDISON Project University of Amsterdam EDISON Initiative, December 2018 EDISON Project (2015-2017) Grant 675419 (INFRASUPP-4-2015: CSA)

EDISON Data Science Framework (EDSF)

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EDISON Data Science Framework (EDSF)

EDISON Data Science Framework (EDSF)

Data Science Professional Education and Training

EDISON Project value proposition and legacy

Yuri Demchenko, EDISON Project

University of Amsterdam

EDISON Initiative, December 2018

EDISON Project (2015-2017)Grant 675419 (INFRASUPP-4-2015: CSA)

Page 2: EDISON Data Science Framework (EDSF)

Outline

• Background: Data driven research and demand for new skills

– Foundation, recent reports, studies and facts

• EDISON Data Science Framework (EDSF)

– Data Science competences and skills

– Essential Data Scientist professional skills: Thinking and doing like Data Scientist

• Data Science Professional Profiles

• Data Science Body of Knowledge and Model Curriculum

• Use of EDSF and Example curricula

– Competences assessment

– Building Data Science team

• Roadmap recommendations

• References and additional materials

EDISON 2018 Slide DeckData Science Professional Education and

Training2

This work is licensed under the Creative Commons Attribution 4.0 International License.

To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Page 3: EDISON Data Science Framework (EDSF)

Visionaries and Drivers:

Seminal works, High level reports, Activities

The Fourth Paradigm: Data-Intensive Scientific Discovery.

By Jim Gray, Microsoft, 2009. Edited by Tony Hey, Kristin Tolle, et al.

http://research.microsoft.com/en-us/collaboration/fourthparadigm/

EDISON 2018 Slide DeckData Science Professional Education and

Training3

Riding the wave: How Europe can gain from

the rising tide of scientific data.

Final report of the High Level Expert Group on

Scientific Data. October 2010.

http://cordis.europa.eu/fp7/ict/e-

infrastructure/docs/hlg-sdi-report.pdf

The Data Harvest: How

sharing research data

can yield knowledge,

jobs and growth.

An RDA Europe Report.

December 2014

https://rd-alliance.org/data-

harvest-report-sharing-data-

knowledge-jobs-and-growth.html

https://www.rd-alliance.org/

HLEG report on European

Open Science Cloud

(October 2016)

https://ec.europa.eu/research/openscienc

e/pdf/realising_the_european_open_scie

nce_cloud_2016.pdf

Emergence of Cognitive Technologies

(IBM Watson, Cortana and others)

Page 4: EDISON Data Science Framework (EDSF)

Riding the wave (2010): How Europe can gain

from the rising tide of scientific data.

• “Unlocking the full value of scientific data” - Neelie Kroes, Vice-President of the European Commission, responsible for the Digital Agenda

• Just how students will be trained in the future, or how the profession of “data scientist” will be developed, are among the questions the resolution of which is still evolving and will present intellectual challenges for both privately and publicly supported research. - John Wood, HLEG Chair

• Vision 2030: “Our vision is a scientific e-Infrastructure that supports seamless access, use, re-use and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure.”

• Proposed set of actions– 4. Train a new generation of data scientists, and broaden public understanding

We urge that the European Commission promote, and the member-states adopt, new policies to foster the development of advanced-degree programmes at our major universities for the emerging field of data scientist. We also urge the member-states to include data management and governance considerations in the curricula of their secondary schools, as part of the IT familiarisation programmes that are becoming common in European education.

EDISON 2018 Slide DeckData Science Professional Education and

Training4

Page 5: EDISON Data Science Framework (EDSF)

The Data Harvest (2014): How sharing research data

can yield knowledge, jobs and growth

• Planning the data harvest – John Wood

• The era of data driven science

• We want the right minds, with the right data, at the right time. That’s a tall order that requires change in: – The way science works and scientists think

– How scientific institutions operate and interact

– How scientists are trained and employed

Recommendation 2

• DO promote data literacy across society, from researcher to citizen. Embracing these new possibilities requires training and cultural education – inside and outside universities. Data science must be promoted– A first-class science: Data sharing provides the foundation for a new branch of

science.

– Data education: Training in the use, evaluation and responsible management of data needs to be embedded in curricula, across all subjects, from primary school to university.

– Training within EU projects

– Government and public sector training

EDISON 2018 Slide DeckData Science Professional Education and

Training5

Page 6: EDISON Data Science Framework (EDSF)

HLEG EOSC Report Essentials – Core Data

Experts [ref]

• Core Data Experts is a new class of colleagues with core scientific professional

competencies and the communication skills to fill the gap between the two

cultures.

– Core data experts are neither computer savvy research scientists nor are they hard-

core data or computer scientists or software engineers.

– They should be technical data experts, though proficient enough in the content domain

where they work routinely from the very beginning (experimental design, proposal

writing) until the very end of the data discovery cycle

– Converge two communities:

• Scientists need to be educated to the point where they hire, support and respect Core Data

Experts

• Data Scientists (Core Data Experts) need to bring the value to scientific research and

organisations

• Implementation of the EOSC needs to include instruments to help train, retain

and recognise this expertise,

– In order to support the 1.7 million scientists and over 70 million people working in

innovation.

EDISON 2018 Slide DeckData Science Professional Education and

Training6

[ref] https://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf

Page 7: EDISON Data Science Framework (EDSF)

EOSC Report Recommendations –

Implementation on training and skills

• I2.1: Set initial guiding principles to kick-start the initiative as quickly as

possible. – A first cohort of core data experts should be trained to translate the needs for data driven science

into technical specifications to be discussed with hard-core data scientists and engineers.

– This new class of core data experts will also help translate back to the hard- core scientists the

technical opportunities and limitations

• I3: Fund a concerted effort to develop core data expertise in Europe. – Substantial training initiative in Europe to locate, create, maintain and sustain the required core data

expertise.

– By 2022, to train (hundreds of thousands of) certified core data experts with a demonstrable

effect on ESFRI/e-INFRA activities and prospects for long-term sustainability of this critical

human resource

• Consolidate and further develop assisting material and tools for Data Management Plans and Data

Stewardship plans (including long-term preservation in FAIR status)

• I7: Provide a clear operational timeline to deal with the early preparatory

phase of the EOSC.– Define training needs for the necessary data expertise and draw models for the necessary

training infrastructure

EDISON 2018 Slide DeckData Science Professional Education and

Training7

Page 8: EDISON Data Science Framework (EDSF)

Initiatives: GO FAIR and IFDS

• Global Open FAIR

– Findable – Accessible – Interoperable - Reusable

• IFDS – Internet of FAIR Data and Services = EOSC

• GO FAIR implementation approach

– GO-TRAIN: Training of data stewards capable of providing FAIR data services

– FAIRdICT: Top Sector Health collaboration with top team ICT

• A critical success factor is availability of expertise in data stewardship

– Training of a new generation of FAIR data experts is urgently needed to provide the necessary capacity

https://www.dtls.nl/fair-data/

https://www.dtls.nl/fair-data/go-fair/

https://www.dtls.nl/fair-data/fair-data-training/

EDISON 2018 Slide DeckData Science Professional Education and

Training8

Page 9: EDISON Data Science Framework (EDSF)

Industry reports on Data Science Analytics and Data

enabled skills demand

• Final Report on European Data Market Study by IDC (Feb 2017)– The EU data market in 2016 estimated EUR 60 Bln (growth 9.5% from

EUR 54.3 Bln in 2015)

• Estimated EUR 106 Bln in 2020

– Number of data workers 6.1 mln (2016) - increase 2.6% from 2015

• Estimated EUR 10.4 million in 2020

– Average number of data workers per company 9.5 - increase 4.4%

– Gap between demand and supply estimated 769,000 (2020) or 9.8%

• PwC and BHEF report “Investing in America’s data science and analytics talent: The case for action” (April 2017)

– http://www.bhef.com/publications/investing-americas-data-science-and-analytics-talent

– 2.35 mln postings, 23% Data Scientist, 67% DSA enabled jobs

– DSA enabled jobs growing at higher rate than main Data Science jobs

• Burning Glass Technology, IBM, and BHEF report “The Quant Crunch: How the demand for Data Science Skills is disrupting the job Market” (April 2017)

– https://public.dhe.ibm.com/common/ssi/ecm/im/en/iml14576usen/IML14576USEN.PDF

– DSA enabled jobs takes 45-58 days to fill: 5 days longer than average

– Commonly required work experience 3-5 yrs

EDISON 2018 Slide DeckData Science Professional Education and

Training9

Citing EDISON and EDSF

Influenced by EDISON

Page 10: EDISON Data Science Framework (EDSF)

PwC&BHEF: Demand for DSA enabled jobs

Demand for business people with

analytics skills, not just data

scientists

• Of 2.35 million job postings in

the US

– 23% Data Scientist

– 67% DSA enabled jobs

• Strong demand for managers

and decision makers with Data

Science (data analytics)

skills/understanding

– Challenge to deliver actionable

knowledge and competences to

CEO level managers

EDISON 2018 Slide DeckData Science Professional Education and

Training10

Page 11: EDISON Data Science Framework (EDSF)

PwC&BHEF: Data Science and Data Analytics

Competences for Managers and Decision Makers

Percent of employers who say data science and analytics skills will be ‘required of all managers’ by 2020

• Source: BHEF and Gallup, Data Science and Analytics Business Survey (December 2016).

EDISON 2018 Slide DeckData Science Professional Education and

Training11

Page 12: EDISON Data Science Framework (EDSF)

PwC&BHEF: Skills that are tough to find

EDISON 2018 Slide DeckData Science Professional Education and

Training12

To be mapped to Competences, Knowledge, Skills and Personal (soft) Skills

Faster growing jobs require both analytical and social skills

Page 13: EDISON Data Science Framework (EDSF)

PwC&BHEF: Data Science and Analytics skills, by

2021: The supply-demand challenge

EDISON 2018 Slide DeckData Science Professional Education and

Training13

Page 14: EDISON Data Science Framework (EDSF)

IBM&BGT: DSA Jobs Time to Fill and Salary

(2016-2017)

• On average, DSA jobs in Professional Services remain open for 53 days, eight days longer than the overall DSA average. (IBM, BGT 2017 Study)

EDISON 2018 Slide DeckData Science Professional Education and

Training14

Page 15: EDISON Data Science Framework (EDSF)

Closer look at Data related Jobs and Salaries (2016)

Source: The Job Market for Data Professionals, by Robert R Downs, SciDataCon2016

http://www.scidatacon.org/2016/sessions/98/poster/51/

EDISON 2018 Slide DeckData Science Professional Education and

Training15

Page 16: EDISON Data Science Framework (EDSF)

OECD and UN on Digital Economy and Data Literacy

OECD (Organisation for Economic Coopration and Development)

• Demand for new type of “dynamic self-re-skilling workforce”

• Continuous learning and professional development to become a shared responsibility of workers and organisations

[ref] Skills for a Digital World, OECD, 25-May-2016 http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=DSTI/ICCP/IIS(2015)10/FINAL&docLanguage=En

UN

• Data Revolution Report "A WORLD THAT COUNTS" Presented to Secretary-General (2014)http://www.undatarevolution.org/report/

• Data Literacy is defined as key for digital revolution and Industry 4.0

• Data literacy = critically analyse data collected and data visualised

EDISON 2018 Slide DeckData Science Professional Education and

Training16

Page 17: EDISON Data Science Framework (EDSF)

PwC study: Millennials at work (2016) - 1

EDISON 2018 Slide DeckData Science Professional Education and

Training17

Confirmed results of previous studies:

• Loyalty-lite to company

– The power of employer brands and the waning

importance of corporate responsibility

• A time of compromise: benefit from individual

package negotiation

• Development and work/life balance are more

important than position or salary

– Work/life balance and diversity promises are

not being kept

• Financial reward is secondary but cash bonuses are valued

• A techno generation avoiding face time and prefer network

communication

• Moving up the ladder faster expectation but often not confirmed

by hard work required

• Generational communication but not without tensions

Page 18: EDISON Data Science Framework (EDSF)

PwC study: Millennials at work (2016) - 2

• What organisation is an attractive employer?– Opportunities for career progression

– Competitive wages/other financial incentives

– Excellent training/development programmes

• Factors most influenced decision to accept your current job?

– The opportunity for personal development

– The reputation of the organisation

– The role itself

• Which three benefits would you most value from an employer?

– Training and development

– Flexible working hours

– Cash bonuses

EDISON 2018 Slide DeckData Science Professional Education and

Training18

What can employers do?

Business leaders and HR need to work together to:

• Understand this generation

• Get the ‘deal’ right

• Help millennials grow

• Feedback, feedback and more feedback

• Set them free

• Encourage learning

• Allow faster advancement

• Expect millennials to go

Page 19: EDISON Data Science Framework (EDSF)

Industry 4.0 and demand for new skills

The Fourth Industrial

Revolution, which includes

developments in previously

disjointed fields such as

artificial intelligence and

machine-learning, robotics,

nanotechnology, 3-D printing,

and genetics and

biotechnology, will cause

widespread disruption not

only to business models but

also to labour markets over

the next five years, with

enormous change predicted

in the skill sets needed to

thrive in the new landscape.

This is the finding of a new

report, The Future of Jobs,

published today by the World

Economic Forum.

EDISON 2018 Slide DeckData Science Professional Education and

Training19

Page 20: EDISON Data Science Framework (EDSF)

Maritime Industry Digital Transformation

and Skills Strategy – Toward Industry 4.0

KNOWLEDGE TRANSFER

LONG-TERM SKILLS STRATEGY

OCEAN LITERACY

CURRICULUM DESIGN

EDUCATION & TRAINING

OCCUPATIONAL PROFILES

Digital Transformation• Digitalisation and IoT• Intelligent Information• Data Management• Digital Assets Manage• Data Driven Optimisation • Agile Continuous

Improvement• Customer Experience• People and skills

Digital Competences/Skills• Automation, robotics,

electrical vehicles• Information and data

literacy • Communication and

collaboration• Digital content creation,

safety • Problem solving and

critical thinking

Big Data

Internet ofThings

SystemIntegration

AdditiveManufacturing

EDISON 2018 Slide DeckData Science Professional Education and

Training20

Page 21: EDISON Data Science Framework (EDSF)

Challenge for Education:

Sustainable ICT and Data Skills Development

• Educate vs Train– Training is a short term solution

– Education is a basis for sustainable skills development

• Technology focus changes every 3-4 years– Study: 50% of academic curricula are outdated at the time of graduation

• Lack of necessary skills leads to underperforming projects and organisations and loose of competitiveness– Challenge: Policy and decision makers still don’t include planning human factor

(competences and skills) as a part of the technology strategy

• Need to change the whole skills management paradigm– Dynamic (self-) re-skilling: Continuous professional development and shared

responsibility between employer and employee

– Professional and workplace skills and career management as a part of professional orientation

• Millennials factor and changing nature of workforce

EDISON 2018 Slide DeckData Science Professional Education and

Training21

Page 22: EDISON Data Science Framework (EDSF)

• EDISON Data Science Framework (EDSF)

– Compliant with EU standards on competences and

professional occupations e-CFv3.0, ESCO

– Customisable courses design for targeted education and

training

• Skills development and career management for Core

Data Experts and related data handling professions

• Capacity building and Data Science team design

• Academic programmes and professional training

courses (self) assessment and design

• EU network of Champion universities pioneering Data

Science academic programmes

• Engagement in relevant RDA activities and groups

• Cooperation with International professional organisations

IEEE, ACM, BHEF, APEC (AP Economic Cooperation )

EDISON Products for Data Science Skills

Management and Tailored Education

EDISON 2018 Slide DeckData Science Professional Education and

Training22

Page 23: EDISON Data Science Framework (EDSF)

EDISON Data Science Framework (EDSF)

EDISON 2018 Slide DeckData Science Professional Education and

Training23

EDISON Framework components– CF-DS – Data Science Competence Framework

– DS-BoK – Data Science Body of Knowledge

– MC-DS – Data Science Model Curriculum

– DSP – Data Science Professional profiles

– Data Science Taxonomies and Scientific Disciplines

Classification

– EOEE - EDISON Online Education Environment

Methodology

• ESDF development based on job market

study, existing practices in academic,

research and industry.

• Review and feedback from the ELG, expert

community, domain experts.

• Input from the champion universities and

community of practice.

Page 24: EDISON Data Science Framework (EDSF)

What challenges related to skills management the

EDSF can help to address?

1. Guide researchers in using right methods and tools, latest Data

Analytics technologies to extracting value from scientific data

2. Educate and train RI engineers dev to build modern data intensive

research infrastructure and understand trends and project for future

3. Develop new data analytics tools and ensure continuous improvement

(agile model, DevOps)

4. Correctly organise and manage data, make them accessible (adhering

FAIR principles), education new profession of Data Stewards

5. Help managers to facilitate career dev for researchers and organise

effective teams

6. Ensure skills and expertise sustain in organisation

7. Help research institutions to sustain in competition with industry and

business in data science talent hunting

EDISON 2018 Slide DeckData Science Professional Education and

Training24

Page 25: EDISON Data Science Framework (EDSF)

Competences Map to Knowledge and Skills

• Competence is a demonstrated ability to apply knowledge, skills and attitudes for achieving observable results

EDISON 2018 Slide DeckData Science Professional Education and

Training25

Page 26: EDISON Data Science Framework (EDSF)

CF-DS/EDSF Approach: Compatibility with e-

CFv3.0 structure and 4-dimensional model

• Competence Framework for Data Science (CF-DS) definition will be built based on European e-Competence framework for IT (e-CFv3.0)

– Linking scientific research lifecycle, organizational roles, competences, skills and knowledge

– Defining Data Science Body of Knowledge (DS-BoK)

– Mapping CF-DS and DS-BoK to academic disciplines

EDISON 2018 Slide DeckData Science Professional Education and

Training26

• Multiple use of e-CFv3.0 within ICT organisations

• Provides basis for individual career path, competence assessment, training and certification

• EDISON CF-DS will be used for defining DS-BoK and MC-DS, linking organizational functions and required knowledge

• Provide basis for individual (self) training and certification

Page 27: EDISON Data Science Framework (EDSF)

Data Scientist definition

Based on the definitions by NIST SP1500 – 2015, extended by EDISON

• A Data Scientist is a practitioner who has sufficient

knowledge in the overlapping regimes of expertise in

business needs, domain knowledge, analytical skills,

and programming and systems engineering expertise

to manage the end-to-end scientific method process

through each stage in the big data lifecycle till the delivery

of an expected scientific and business value to

organisation or project.

EDISON 2018 Slide DeckData Science Professional Education and

Training27

• Core Data Science competences and skills groups

– Data Science Analytics (including Statistical Analysis, Machine Learning, Business Analytics)

– Data Science Engineering (including Software and Applications Engineering, Data

Warehousing, Big Data Infrastructure and Tools)

– Domain Knowledge and Expertise (Subject/Scientific domain related)

• EDISON identified 2 additional competence groups demanded by organisations

– Data Management, Data Governance, Stewardship, Curation, Preservation

– Research Methods and/vs Business Processes/Operations

• Data Science professional skills: Thinking and acting like Data Scientist – required to

successfully develop as a Data Scientist and work in Data Science teams

Page 28: EDISON Data Science Framework (EDSF)

Data Science Competence Groups - Research

EDISON 2018 Slide DeckData Science Professional Education and

Training28

Scientific Methods

• Design Experiment

• Collect Data

• Analyse Data

• Identify Patterns

• Hypothesis Explanation

• Test Hypothesis

Business Operations

• Operations Strategy

• Plan

• Design & Deploy

• Monitor & Control

• Improve & Re-design

Data Science Competences

include 5 groups

• Data Science Analytics

• Data Science Engineering

• Domain Knowledge and Expertise

• Data Management

• Research Methods and Project

Management

– Business Process Management (biz)

Page 29: EDISON Data Science Framework (EDSF)

Scientific Methods

• Design Experiment

• Collect Data

• Analyse Data

• Identify Patterns

• Hypothesise Explanation

• Test Hypothesis

Business Process

Operations/Stages

• Design

• Model/Plan

• Deploy & Execute

• Monitor & Control

• Optimise & Re-design

Data Science Competences Groups – Business

EDISON 2018 Slide DeckData Science Professional Education and

Training29

Data Science Competences

include 5 groups

• Data Science Analytics

• Data Science Engineering

• Domain Knowledge and Expertise

• Data Management

• Research Methods and Project

Management

– Business Process Management (biz)

Page 30: EDISON Data Science Framework (EDSF)

Identified Data Science Competence GroupsData Science Analytics (DSDA)

Data Science Engineering (DSENG)

Data Management and Governance (DSDM)

Research/Scientific Methods and Project Management (DSRMP)

Data Science Domain Knowledge, e.g. Business Analytics (DSDK/DSBPM)

0 Use appropriate data analytics and statistical techniques on available data to deliver insights into research problem or org. processes and support decision making

Use engineering principles and modern computer technology to research, design, implement new data analytics applications, develop experiments, processes, instruments, systems and infrastructures to support data handling during the whole data lifecycle

Develop and implement data management strategy for data collection, storage, preservation, and availability for further processing.

Create new understandings and capabilities by using the scientific method (hypothesis, test/artefact, evaluation) or similar engineering methods to discover new approaches to create new knowledge and achieve research or organisational goals

DSDK/DSBA Use domain knowledge (scientific or business) to develop relevant data analytics applications; adopt general Data Science methods to domain specific data types and presentations, data and process models, organisational roles and relations

1 DSDA01Effectively use variety of data analytics techniques

DSENG01Use engineering principles (general and software) to research, design, develop and implement new instruments and applications

DSDM01Develop and implement data strategy, in particular, Data Management Plan (DMP)

DSRMP01Create new understandings and capabilities by using scientific/ research methods

DSBPM01Understand business and provide insight, translate unstructured business problems into an abstract mathematical framework

2 DSDA02Apply designated quantitative techniques

DSENG02Develop and apply computer methods to domain related problems

DSDM02Develop data models including metadata

DSRMP02Direct systematic study toward a fuller knowledge or understanding of the observable facts

DSBPM02Participate strategically and tactically in financial decisions

3 DSDA03Pull together data from diff sources …

DSENG03Develop and prototype data analytics applications

DSDM03Collect integrate data

DSRMP03Undertakes creative work

DSBPM03Provides support services to other

4 DSDA04Use diff perform techniques

DSENG04Develop, deploy operate Big Data storage

DSDM04Maintain repository

DSRMP04Translate strategies into actions

DSBPM04Analyse data for marketing

5 DSDA05Develop analytics applic

DSENG05Apply security mechanisms

DSDM05Visualise cmplx data

DSRMP05Contribute to organis goals

DSBPM05Analyse optimise customer relatio

6 DSDA06Visualise results of analysis, dashboards

DSENG06Design, build, operate SQL and NoSQL

DSRM06Develop and manage prolicies

DSRMP06Develop and guide data driven projects

DSBPM06Analyse data for marketing 30EDISON 2018 Slide Deck

Data Science Professional Education and

Training

Page 31: EDISON Data Science Framework (EDSF)

Identified Data Science Skills/Experience Groups

Skills Type A – Based on knowledge acquired

• Group 1: Skills/experience related to competences

– Data Analytics and Machine Learning

– Data Management/Curation (including both general data management and scientific data management)

– Data Science Engineering (hardware and software) skills

– Scientific/Research Methods or Business Process Management

– Application/subject domain related (research or business)

• Group 2: Mathematics and statistics

– Mathematics and Statistics and others

Skills Type B – Base on practical or workplace experience

• Group 3: Big Data (Data Science) tools and platforms

– Big Data Analytics platforms

– Mathematics & Statistics applications & tools

– Databases (SQL and NoSQL)

– Data Management and Curation platform

– Data and applications visualisation

– Cloud based platforms and tools

• Group 4: Data analytics programming languages and IDE

– General and specialized development platforms for data analysis and statistics

• Group 5: Soft skills and Workplace skills

– Data Science professional skills: Thinking and Acting like Data Scientist

– 21st Century Skills: Personal, inter-personal communication, team work, professional network

EDISON 2018 Slide DeckData Science Professional Education and

Training31

Page 32: EDISON Data Science Framework (EDSF)

Example Data Science

Competences Definition

Compliant with e-CFv3.0

EDISON 2018 Slide DeckData Science Professional Education and

Training32

Page 33: EDISON Data Science Framework (EDSF)

Group 5: Soft skills and Workplace skills

• Data Science professional skills: Thinking and Acting like Data Scientist

• 21st Century Skills: Personal, inter-personal communication, team work,

professional network

• Data Scientist and Subject Domain Specialist

EDISON 2018 Slide DeckData Science Professional Education and

Training33

Page 34: EDISON Data Science Framework (EDSF)

Data Science Professional Skills:

Thinking and Acting like Data Scientist

1. Recognise value of data, work with raw data, exercise good data intuition, use SN and open data

2. Accept (be ready for) iterative development, know when to stop, comfortable with failure, accept the

symmetry of outcome (both positive and negative results are valuable)

3. Good sense of metrics, understand importance of the results validation, never stop looking at

individual examples

4. Ask the right questions

5. Respect domain/subject matter knowledge in the area of data science

6. Data driven problem solver and impact-driven mindset

7. Be aware about power and limitations of the main machine learning and data analytics algorithms

and tools

8. Understand that most of data analytics algorithms are statistics and probability based, so any

answer or solution has some degree of probability and represent an optimal solution for a number of

variables and factors

9. Recognise what things are important and what things are not important (in data modeling)

10. Working in agile environment and coordinate with other roles and team members

11. Work in multi-disciplinary team, ability to communicate with the domain and subject matter experts

12. Embrace online learning, continuously improve your knowledge, use professional networks and

communities

13. Story Telling: Deliver actionable result of your analysis

14. Attitude: Creativity, curiosity (willingness to challenge status quo), commitment in finding new

knowledge and progress to completion

15. Ethics and responsible use of data and insight delivered, awareness of dependability (data scientist is

a feedback loop in data driven companies)

EDISON 2018 Slide DeckData Science Professional Education and

Training34

Page 35: EDISON Data Science Framework (EDSF)

Data Science Professional Skills:

Thinking and Acting like Data Scientist (1)

1. Recognise value of data, work with raw data, exercise good data intuition, use

SN and Open Data

2. Accept (be ready for) iterative development, know when to stop, comfortable

with failure, accept the symmetry of outcome (both positive and negative results

are valuable)

3. Good sense of metrics, understand importance of the results validation, never

stop looking at individual examples

4. Ask the right questions

5. Respect domain/subject matter knowledge in the area of data science

6. Data driven problem solver and impact-driven mindset

7. Be aware about power and limitations of the main machine learning and data

analytics algorithms and tools

8. Understand that most of data analytics algorithms are statistics and

probability based, so any answer or solution has some degree of probability

and represent an optimal solution for a number of variables and factors

EDISON 2018 Slide DeckData Science Professional Education and

Training35

Page 36: EDISON Data Science Framework (EDSF)

Data Science Professional Skills:

Thinking and Acting like Data Scientist (2)

1. S

2. S

3. s

4. Ss

5. S

6. S

7. S

8. s

9. Recognise what things are important and what things are not important (in data

modeling)

10. Working in agile environment and coordinate with other roles and team members

11. Work in multi-disciplinary team, ability to communicate with the domain and subject

matter experts

12. Embrace online learning, continuously improve your knowledge, use professional

networks and communities

13. Story Telling: Deliver actionable result of your analysis

14. Attitude: Creativity, curiosity (willingness to challenge status quo), commitment in finding

new knowledge and progress to completion

15. Ethics and responsible use of data and insight delivered, awareness of dependability

(data scientist is a feedback loop in data driven companies)

EDISON 2018 Slide DeckData Science Professional Education and

Training36

Page 37: EDISON Data Science Framework (EDSF)

21st Century Skills (DARE & BHEF & EDISON)

1. Critical Thinking: Demonstrating the ability to apply critical thinking skills to solve problems and make effective decisions

2. Communication: Understanding and communicating ideas

3. Collaboration: Working with other, appreciation of multicultural difference

4. Creativity and Attitude: Deliver high quality work and focus on final result, intitiative, intellectual risk

5. Planning & Organizing: Planning and prioritizing work to manage time effectively and accomplish assigned tasks

6. Business Fundamentals: Having fundamental knowledge of the organization and the industry

7. Customer Focus: Actively look for ways to identify market demands and meet customer or client needs

8. Working with Tools & Technology: Selecting, using, and maintaining tools and technology to facilitate work activity

9. Dynamic (self-) re-skilling: Continuously monitor individual knowledge and skills as shared responsibility between employer and employee, ability to adopt to changes

10. Professional networking: Involvement and contribution to professional network activities

11. Ethics: Adhere to high ethical and professional norms, responsible use of power data driven technologies, avoid and disregard un-ethical use of technologies and biased data collection and presentation

EDISON 2018 Slide DeckData Science Professional Education and

Training37

Page 38: EDISON Data Science Framework (EDSF)

Data Scientist and Subject Domain Specialist

• Subject domain components

– Model (and data types)

– Methods

– Processes

– Domain specific data and presentation/visualization methods

– Organisational roles and relations

• Data Scientist is an assistant to Subject Domain Specialists

– Translate subject domain Model, Methods, Processes into abstract data driven form

– Implement computational models in software, build required infrastructure and tools

– Do (computational) analytic work and present it in a form understandable to subject

domain

– Discover new relations originated from data analysis and advice subject domain

specialist

– Present/visualise information in domain related actionable way

– Interact and cooperate with different organizational roles to obtain data and deliver

results and/or actionable data

EDISON 2018 Slide DeckData Science Professional Education and

Training38

Page 39: EDISON Data Science Framework (EDSF)

Data Science and Subject Domains

EDISON 2018 Slide DeckData Science Professional Education and

Training39

• Models (and data types)

• Methods

• Processes

Domain specific components

Domain specific

data & presentation

(visualization)

Organisational

roles

• Abstract data driven

math&compute models

• Data Analytics

methods

• Data and Applications

Lifecycle Management

Data Science domain components

Data structures &

databases/storage

Visualisation

Cross-

organisational

assistive role

Data Scientist/Data Steward functions is to translate between two domains

Data Scientist role is to maintain the Data Value Chain (domain specific): • Data Integration => Organisation/Process/Business Optimisation => Innovation

Page 40: EDISON Data Science Framework (EDSF)

Practical Application of the CF-DS

• Basis for the definition of the Data Science Body of Knowledge (DS-BoK) and Data Science Model Curriculum (MC-DS)

– CF-DS => Learning Outcomes (MC-DS) => Knowledge Areas (DS-BoK)

– CF-DS => Data Science taxonomy of scientific subjects and vocabulary

• Data Science professional profiles definition

– Extend existing EU standards and occupations taxonomies: e-CFv3.0, ESCO, others

• Professional competence benchmarking

– For customizable training and career development

– Including CV or organisational profiles matching

• Professional certification

– In combination with DS-BoK professional competences benchmarking

• Vacancy construction tool for job advertisement (for HR)

– Using controlled vocabulary and Data Science Taxonomy

EDISON 2018 Slide DeckData Science Professional Education and

Training40

Page 41: EDISON Data Science Framework (EDSF)

Data Science Professions Family

EDISON 2018 Slide DeckData Science Professional Education and

Training41

Icons used: Credit to [ref] https://www.datacamp.com/community/tutorials/data-science-industry-infographic

Page 42: EDISON Data Science Framework (EDSF)

DSP Profiles mapping to ESCO Taxonomy

High Level Groups

• DSP Profiles mapping to corresponding CF-DS Competence Groups – Relevance level from 5 – maximum to 1 – minimum

EDISON 2018 Slide DeckData Science Professional Education and

Training42

Page 43: EDISON Data Science Framework (EDSF)

CF-DS and Data Science Professional Profiles

EDISON 2018 Slide DeckData Science Professional Education and

Training43

Whole range of

DSA enabled Domain

related Profiles

Data Management

and Stewardship

Page 44: EDISON Data Science Framework (EDSF)

Example DS Professional Profile Definition

(compliant with CWA)

EDISON 2018 Slide DeckData Science Professional Education and

Training44

Profile title Gives a commonly used name to a profile. TEMPLATE

Summary statement Indicates the main purpose of the profile.

The purpose is to present to stakeholders and users a brief, concise

understanding of the specified ICT Profile. It should be understandable

by ICT professionals, ICT managers and Human Resource personnel.

It should provide a statement of the job’s main activity.

Mission Describes the rationale of the profile.

The purpose is to specify the designated job role defined in the ICT Profile.

Deliverables Accountable (A) Responsible (R) Contributor (C)

Specifies the Profile by key deliverables.

The purpose is to illuminate the ICT Profiles and to explain relevance

including the perspective from a non-ICT point of view.

Main task/s Provides a list of typical tasks to be performed by the profile.

A task is an action taken to achieve a result within a broadly defined

context. Tasks may be associated with deadlines, resources, goals,

specifications and/or the expected results.

e-CF

competences

assigned

Provides a list of necessary competences (from the e-CF) to carry out the

mission.

Must include 1 up to 5 competences.

Level assignment is important. Can be (usually) 1 or (maximum) 2 levels.

KPI Area Based upon KPIs (Key Performance Indicators) KPI area is a more generic

indicator, congruent with the overall profile granularity level. It is deployed

to add depth to the mission.

Not prescriptive. Non-specific measurements. Use general examples.

The principle is to provide KPI areas (which are stable, general and long

lasting) providing users with an inspiration to enable development of

specific KPI’s for specific roles

Must be related to the key deliverables in order to measure them.

Page 45: EDISON Data Science Framework (EDSF)

EDSF for Education and Training

• Foundation and methodological base

– Data Science Body of Knowledge (DS-BoK)

• Taxonomy and classification of Data Science related scientific subjects

– Data Science Model Curriculum (MC-DS)

• Set Learning Units mapped to CF-DS Learning and DS-BoK Knowledge Areas/Units

– Instructional methodologies and teaching models

• Platforms and environment

– Virtual labs, datasets, developments platforms

– Online education environment and courses management

• Services

– Individual benchmarking and profiling tools (competence assessment)

– Knowledge evaluation tools

– Certifications and training for self-made Data Scientists practitioners

– Education and training marketplace: Courses catalog and repository

EDISON 2018 Slide DeckData Science Professional Education and

Training45

Page 46: EDISON Data Science Framework (EDSF)

Data Science Body of Knowledge (DS-BoK)

DS-BoK Knowledge Area Groups (KAG)

• KAG1-DSA: Data Analytics group including

Machine Learning, statistical methods,

and Business Analytics

• KAG2-DSE: Data Science Engineering group

including Software and infrastructure engineering

• KAG3-DSDM: Data Management group including data curation,

preservation and data infrastructure

• KAG4-DSRM: Research Methods and Project Management group

• KAG5-DSBA: Business Analytics and Business Intelligence

• KAG* - DSDK: Data Science domain knowledge to be defined by related

expert groups

EDISON 2018 Slide DeckData Science Professional Education and

Training46

Page 47: EDISON Data Science Framework (EDSF)

Data Science Body of Knowledge (1)

KA Groups Suggested DS Knowledge Areas (KA) Knowledge Areas from existing BoK and CCS2012 scientific subject groups

KAG1-DSDA: Data Science Analytics

KA01.01 (DSDA.01/SMDA) Statistical methods for data analysisKA01.02 (DSDA.02/ML) Machine LearningKA01.03 (DSDA.03/DM) Data MiningKA01.04 (DSDA.04/TDM) Text Data MiningKA01.05 (DSDA.05/PA) Predictive AnalyticsKA01.06 (DSDA.06/MODSIM) Computational modelling, simulation and optimisation

There is no formal BoK defined for Data Analytics.

Data Science Analytics related scientific subjects from CCS2012:CCS2012: Computing methodologiesCCS2012: Mathematics of computingCCS2012: Computing methodologies

KAG2-DSENG: Data Science Engineering

KA02.01 (DSENG.01/BDI) Big Data Infrastructure and TechnologiesKA02.02 (DSENG.02/DSIAPP) Infrastructure and platforms for Data Science applicationsKA02.03 (DSENG.03/CCT) Cloud Computing technologies for Big Data and Data AnalyticsKA02.04 (DSENG.04/SEC) Data and Applications securityKA02.05 (DSENG.05/BDSE) Big Data systems organisation and engineeringKA02.06 (DSENG.06/DSAPPD) Data Science (Big Data) applications designKA02.07 (DSENG.07/IS) Information systems (to support data driven decision making)

ACM CS-BoK selected KAs:AR - Architecture and Organization (including computer architectures and network architectures)CN - Computational ScienceIM - Information ManagementSE - Software Engineering (can be extended with specific SWEBOK KAs)

SWEBOK selected KAs• Software requirements• Software design• Software engineering process• Software engineering models and methods• Software qualityData Science Analytics related scientific subjects

from CCS2012

EDISON 2018 Slide DeckData Science Professional Education and

Training47

Page 48: EDISON Data Science Framework (EDSF)

Data Science Body of Knowledge (2)

KA Groups Suggested DS Knowledge Areas (KA) Knowledge Areas from existing BoK and CCS2012 scientific subject groups

KAG3-DSDM: Data Management

KA03.01 (DSDM.01/DMORG) General principles and concepts in Data Management and organisationKA03.02 (DSDM.02/DMS) Data management systemsKA03.03 (DSDM.03/EDMI) Data Management and Enterprise data infrastructureKA03.04 (DSDM.04/DGOV) Data GovernanceKA03.05 (DSDM.05/BDST0R) Big Data storage (large scale) KA03.06 (DSDM.05/DLIB) Digital libraries and archives

DM-BoK selected KAs(1) Data Governance, (2) Data Architecture, (3) Data Modelling and Design, (4) Data Storage and Operations, (5) Data Security, (6) Data Integration and Interoperability, (7) Documents and Content, (8) Reference and Master Data, (9) Data Warehousing and Business Intelligence, (10) Metadata, and (11) Data Quality.

KAG4-DSRM: Research Methods and Project Management

KA04.01 (DSRMP.01/RM) Research MethodsKA04.01 (DSRMP.02/PM) Project Management

There are no formally defined BoK for research methodsPMI-BoK selected KAs• Project Integration Management • Project Scope Management • Project Quality • Project Risk Management

KAG5-DSBPM: Business Analytics

KA05.01 (DSBA.01/BAF) Business Analytics FoundationKA05.02 (DSBA.02/BAEM) Business Analytics organisation and enterprise management

BABOK selected KAs *)Business Analysis Planning and MonitoringRequirements Life Cycle ManagementSolution Evaluation and improvements recommendation

EDISON 2018 Slide DeckData Science Professional Education and

Training48

Page 49: EDISON Data Science Framework (EDSF)

Data Science Model Curriculum (MC-DS)

Data Science Model Curriculum includes

• Learning Outcomes (LO) definition based on CF-DS– LOs are defined for CF-DS competence groups and for all

enumerated competences

– Knowledge levels: Familiarity, Usage, Assessment (based in Bloom’s Taxonomy)

• LOs mapping to Learning Units (LU)– LUs are based on CCS(2012) and universities best practices

– Data Science university programmes and courses inventory (interactive)http://edison-project.eu/university-programs-list

• LU/course relevance: Mandatory Tier 1, Tier 2, Elective, Prerequisite

• Learning methods and learning models (in progress)

EDISON 2018 Slide DeckData Science Professional Education and

Training49

Page 50: EDISON Data Science Framework (EDSF)

Learning methods and learning models

• Bloom’s Taxonomy and Cognitive learning activities

– BT application areas and limitations

• Constructive Alignment and Intended Learning Outcome (ILO)

– ILO is formulated from the student perspective

– Outcome Based Learning (OBL)

• Other education technologies for teaching in fast technology changing

world

– Project Based Learning (PBL)

– Flipped classroom

– Activating teaching and

activating strategies

EDISON 2018 Slide DeckData Science Professional Education and

Training50

Page 51: EDISON Data Science Framework (EDSF)

Bloom’s Taxonomy and Knowledge Levels for

MC-DS

EDISON 2018 Slide DeckData Science Professional Education and

Training51

Level Action Verbs

Familiarity Choose, Classify, Collect, Compare, Configure, Contrast, Define, Demonstrate, Describe, Execute, Explain, Find, Identify, Illustrate, Label, List, Match, Name, Omit, Operate, Outline, Recall, Rephrase, Show, Summarize, Tell, Translate

Usage Apply, Analyze, Build, Construct, Develop, Examine, Experiment with, Identify, Infer, Inspect, Model, Motivate, Organize, Select, Simplify, Solve, Survey, Test for, Visualize

Assessment Adapt, Assess, Change, Combine, Compile, Compose, Conclude, Criticize, Create, Decide, Deduct, Defend, Design, Discuss, Determine, Disprove, Evaluate, Imagine, Improve, Influence, Invent, Judge, Justify, Optimize, Plan, Predict, Prioritize, Prove, Rate, Recommend, Solve

Page 52: EDISON Data Science Framework (EDSF)

Extended Bloom’s Taxonomy

Consolidated presentation

of learning levels, action

verbs, and associated

learning instruments

[ref] Anderson, L. W., &

Krathwohl, D. R. (2001). A

taxonomy for learning,

teaching, and assessing,

Abridged Edition. Boston, MA:

Allyn and Bacon.

EDISON 2018 Slide DeckData Science Professional Education and

Training52

Page 53: EDISON Data Science Framework (EDSF)

Bloom’s Taxonomy – Cognitive Activities

KnowledgeExhibit memory of previously learned materials by recalling facts, terms, basic concepts and answers

• Knowledge of specifics - terminology, specific facts

• Knowledge of ways and means of dealing with specifics - conventions, trends and sequences, classifications and categories, criteria, methodology

• Knowledge of the universals and abstractions in a field - principles and generalizations, theories and structures

• Questions like: What are the main benefits of implementing Big Data and data analytics methods for organisation?

ComprehensionDemonstrate understanding of facts and ideas by organizing, comparing, translating, interpreting, describing, and stating the main ideas

• Translation, Interpretation, Extrapolation

• Questions like: Compare the business and operational models of private clouds and hybrid clouds.

ApplicationUsing new knowledge. Solve problems in new situations by applying acquired knowledge, facts, techniques and rules in a different way

• Questions like: What data analytics methods should be applied for specific data types analysis or for specific business processes and activities?

Which Big Data services architecture is best suited for medium size research organisation or company, and why??

AnalysisExamine and break information into parts by identifying motives or causes. Make inferences and find evidence to support generalizations

• Analysis of elements, relationships, organizational principles

• Questions like: What data analytics methods and services are required to support typical business processes of a web trading company? Give suggestions how these

services can be implemented with the selected data analytics platform, including on-premises or outsourced to cloud. Provide references to support your statements.

SynthesisCompile information together in a different way by combining elements in a new pattern or proposing alternative solutions

• Production of a unique communication, a plan, or proposed set of operations, derivation of a set of abstract relations

• Questions like: Describe the main steps and tasks for implementing data analytics and data management services for an example company or research organisation?

What services and data analytics can be moved to clouds and which will remain at the enterprise premises and run by company’s personnel?

EvaluationPresent and defend opinions by making judgments about information, validity of ideas or quality of work based on a set of criteria

• Judgments in terms of internal evidence or external criteria

• Questions like: Do you think that implementing Agile Data Driven Enterprise model creates benefits for enterprises, short term and long term?

EDISON 2018 Slide DeckData Science Professional Education and

Training53

Page 54: EDISON Data Science Framework (EDSF)

Data Science Model Curriculum (MC-DS)

Data Science Model Curriculum includes

• Learning Outcomes (LO) definition based on CF-DS– LOs are defined for CF-DS competence groups and for all

enumerated competences

– Knowledge levels: Familiarity, Usage, Assessment (based in Bloom’s Taxonomy)

• LOs mapping to Learning Units (LU)– LUs are based on CCS(2012) and universities best practices

– Data Science university programmes and courses inventory (interactive)http://edison-project.eu/university-programs-list

• LU/course relevance: Mandatory Tier 1, Tier 2, Elective, Prerequisite

• Learning methods and learning models (in progress)

EDISON 2018 Slide DeckData Science Professional Education and

Training54

Page 55: EDISON Data Science Framework (EDSF)

Knowledge levels for Learning Outcomes

(defined based on Bloom’s Taxonomy)

Level Action Verbs

Familiarity Choose, Classify, Collect, Compare, Configure, Contrast, Define, Demonstrate, Describe, Execute, Explain, Find, Identify, Illustrate, Label, List, Match, Name, Omit, Operate, Outline, Recall, Rephrase, Show, Summarize, Tell, Translate

Usage Apply, Analyze, Build, Construct, Develop, Examine, Experiment with, Identify, Infer, Inspect, Model, Motivate, Organize, Select, Simplify, Solve, Survey, Test for, Visualize

Assessment Adapt, Assess, Change, Combine, Compile, Compose, Conclude, Criticize, Create, Decide, Deduct, Defend, Design, Discuss, Determine, Disprove, Evaluate, Imagine, Improve, Influence, Invent, Judge, Justify, Optimize, Plan, Predict, Prioritize, Prove, Rate, Recommend, Solve

EDISON 2018 Slide DeckData Science Professional Education and

Training55

Page 56: EDISON Data Science Framework (EDSF)

Data Science Data Analytics (KAG1 – DSDA)

related courses

• KA01.01 (DSDA/SMDA) Statistical methods, including Descriptive statistics,

exploratory data analysis (EDA) focused on discovering new features in the data,

and confirmatory data analysis (CDA) dealing with validating formulated

hypotheses;

• KA01.02 (DSDA/ML) Machine learning and related methods for information

search, image recognition, decision support, classification;

• KA01.03 (DSDA/DM) Data mining is a particular data analysis technique that

focuses on modelling and knowledge discovery for predictive rather than purely

descriptive purposes;

• KA01.04 (DSDA/TDM) Text analytics applies statistical, linguistic, and structural

techniques to extract and classify information from textual sources, a species of

unstructured data;

• KA01.05 (DSDA/PA) Predictive analytics focuses on application of statistical

models for predictive forecasting or classification;

• KA01.06 (DSDA/MODSIM) Computational modelling, simulation and optimisation.

EDISON 2018 Slide DeckData Science Professional Education and

Training56

Page 57: EDISON Data Science Framework (EDSF)

Data Science Engineering (KAG2-DSENG)

• KA02.01 (DSENG/BDI) Big Data infrastructure and technologies, including NOSQL databased, platforms for Big Data deployment and technologies for large-scale storage;

• KA02.02 (DSENG/DSIAPP) Infrastructure and platforms for Data Science applications, including typical frameworks such as Spark and Hadoop, data processing models and consideration of common data inputs at scale;

• KA02.03 (DSENG/CCT) Cloud Computing technologies for Big Data and Data Analytics;

• KA02.04 (DSENG/SEC) Data and Applications security, accountability, certification, and compliance;

• KA02.05 (DSENG/BDSE) Big Data systems organization and engineering, including approached to big data analysis and common MapReduce algorithms;

• KA02.06 (DSENG/DSAPPD) Data Science (Big Data) application design, including languages for big data (Python, R), tools and models for data presentation and visualization;

• KA02.07 (DSENG/IS) Information Systems, to support data-driven decision making, with focus on data warehouse and data centers.

EDISON 2018 Slide DeckData Science Professional Education and

Training57

Page 58: EDISON Data Science Framework (EDSF)

KAG3-DSDM: Data Management group: data curation,

preservation and data infrastructure

DM-BoK version 2 “Guide for

performing data management”

– 11 Knowledge Areas

(1) Data Governance

(2) Data Architecture

(3) Data Modelling and Design

(4) Data Storage and Operations

(5) Data Security

(6) Data Integration and

Interoperability

(7) Documents and Content

(8) Reference and Master Data

(9) Data Warehousing and Business

Intelligence

(10) Metadata

(11) Data Quality

EDISON 2018 Slide DeckData Science Professional Education and

Training

Other Knowledge Areas motivated by

RDA, European Open Data initiatives,

European Open Data Cloud

(12) PID, metadata, data registries

(13) Data Management Plan

(14) Open Science, Open Data, Open

Access, ORCID

(15) Responsible data use

58

• Highlighted in red: Considered (Research) Data Management literacy (minimum required knowledge)

Page 59: EDISON Data Science Framework (EDSF)

Outcome Based Educations and Training Model

From Competences and DSP

Profiles

to Learning Outcomes (LO)

and

to Knowledge Unites (KU) and

Learning Units (LU)

• EDSF allow for customized

educational courses and training

modules design

EDISON 2018 Slide DeckData Science Professional Education and

Training59

Page 60: EDISON Data Science Framework (EDSF)

EDSF Data Model and API

• EDSF API provides access to all EDSF functionality

EDISON 2018 Slide DeckData Science Professional Education and

Training60

Page 61: EDISON Data Science Framework (EDSF)

DSP04 – Data Scientist MC structure

EDISON 2018 Slide DeckData Science Professional Education and

Training61

Page 62: EDISON Data Science Framework (EDSF)

DSP10 – Data Steward MC structure

EDISON 2018 Slide DeckData Science Professional Education and

Training62

Page 63: EDISON Data Science Framework (EDSF)

DSP04 Data Scientist – Required practical skills

and Hands-on labs

Data Science curriculum should include the following elements to achieve necessary skills Type B:

• Python (or R) and corresponding data analytics libraries

• NoSQL and SQL Databases (Hbase, MongoDB, Cassandra, Redis, Accumulo, MS SQL, My SQL, PostgreSQL, etc.)

• Big Data Analytics platforms (Hadoop, Spark, Data Lakes, others)

• Real time and streaming analytics systems (Flume, Kafka, Storm)

• Kaggle competition, resources and community platform, including rich data sets, forum and computing resources

• Visualisation software (D3.js, Processing, Tableau, Julia, Raphael, etc.)

• Web API management and web scrapping

• Git versioning system as a general platform for software development

• Development Frameworks: Python, Java or C/C++, AJAX (Asynchronous Javascript and XML), D3.js (Data-Driven Documents), jQuery, others

• Cloud based Big Data and data analytics platforms and services, including large scale storage systems

– Essential for workplace alignment

EDISON 2018 Slide DeckData Science Professional Education and

Training63

Page 64: EDISON Data Science Framework (EDSF)

Hybrid Data Science Education Environment

(DSEE)

Hybrid DSEE and VDLabs extends regular compute and

storage resources with cloud based

• Microsoft Azure Data Lakes Analytics, Power BI, HDInsight

Hadoop as a Service, others

• AWS Elastic MapReduce (EMR), QuickSight, Kinesis and

wide collection of open datasets

• IBM Data Science Experience, Data Labs, Watson Analytics

• Google Cloud Platform (GCP)

EDISON 2018 Slide DeckData Science Professional Education and

Training64

Page 65: EDISON Data Science Framework (EDSF)

Competences assessment and Team building

• Data Science competences assessment

(benchmarking)

• Data Science team building

EDISON 2018 Slide DeckData Science Professional Education and

Training65

Page 66: EDISON Data Science Framework (EDSF)

Individual Competences Benchmarking

Individual Education/Training Path

based on Competence benchmarking

• Red polygon indicates the chosen

professional profile: Data Scientist

(general)

• Green polygon indicates the candidate or

practitioner competences/skills profile

• Insufficient competences (gaps) are

highlighted in red– DSDA01 – DSDA06 Data Science Analytics

– DSRM01 – DSRM05 Data Science Research

Methods

• Can be use for team skills match marking and

organisational skills management

[ref] For DSP Profiles definition and for enumerated

competences refer to EDSF documents CF-DS and DSP

Profiles.

EDISON 2018 Slide DeckData Science Professional Education and

Training66

Page 67: EDISON Data Science Framework (EDSF)

Competence/Knowledge gap -> Suggested

LUs/courses

Recommended courses

DSDA

• Statistical Methods

• Machine Learning

• Predictive and

Quantitative analytics

• Graph Data Analysis

• Data preparation and

preprocessing

• Performance Analysis

Recommended courses

DSRM

• Research Methods and

Project Management

EDISON 2018 Slide DeckData Science Professional Education and

Training67

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

DSDA01DSDA02

DSDA03

DSDA04

DSDA05

DSDA06

DSENG01

DSENG02

DSENG03

DSENG04

DSENG05

DSENG06

DSDM01

DSDM02

DSDM03DSDM04

DSDM05

DSDM06

DSRM01

DSRM02

DSRM03

DSRM04

DSRM05

DSRM06

DSBPM01

DSBPM02

DSBPM03

DSBPM04

DSBPM05

DSBPM06

GAPS FOR DSP04

Page 68: EDISON Data Science Framework (EDSF)

Mapping to career path:

Mastery Levels against Competences Relevance

Mastery levels defined using workplace terminology that can be easy mapped to mastery levels defined in MC-DS:

• A - Awareness 1) Understand Terminology

2) Understand Principles

3) Apply principles

4) Understand Methods

• U - Use/Application 5) Apply basics

6) Supervised use

7) Unsupervised Use

• P - Professional/Expert 8) Development of applications using wide range of technologies

9) Supervise project development, team of professionals

EDISON 2018 Slide DeckData Science Professional Education and

Training68

User/Professional level:

Measurable competences and

knowledge

Professional/Expert level:

Peer-reviewed competences

and skills

Page 69: EDISON Data Science Framework (EDSF)

Building a Data Science Team

EDISON 2018 Slide DeckData Science Professional Education and

Training69

Page 70: EDISON Data Science Framework (EDSF)

Data Science or Data Management Group/Department

• (Managing) Data Science Architect (1)

• Data Scientist (1), Data Analyst (1)

• Data Science Application programmer (2)

• Data Infrastructure/facilities administrator/operator: storage, cloud,

computing (1)

• Data stewards, curators, archivists (3-5)

Estimated: Group of 10-12 data specialists for research institution of 200-

300 research staff.

Growing role and demand for Data Stewards and data stewardship

EDISON 2018 Slide DeckData Science Professional Education and

Training70

Data Science or Data Management Group/Department:

Organisational structure and staffing - EXAMPLE

>> Reporting to CDO/CTO/CEO• Providing cross-organizational

services

Page 71: EDISON Data Science Framework (EDSF)

Data Stewardship in Research and FAIR

Principles

• FAIR Initiative by Dutch Techcentre for Life Science

(DTLS) – Prof. Barend Mons– Supported by Germany, France, Spain, UK, USA

– Part of Horizon 2020 Programme

• FAIR Principles for research data:

Findable – Accessible – Interoperable - Reusable

• Data Stewards as a key bridging role between Data

Scientists as (hard)core data experts and scientific

domain researchers (HLEG EOSC report)

• Current definition of the Data Steward (part of Data

Science Professional profiles)– Data Steward is a data handling and management

professional whose responsibilities include planning,

implementing and managing (research) data input, storage,

search, and presentation.

– Data Steward creates data model for domain specific data,

support and advice domain scientists/ researchers during the

whole research cycle and data management lifecycle.

EDISON 2018 Slide DeckData Science Professional Education and

Training71

HLEG report on

European Open

Science Cloud

(October 2016)

Page 72: EDISON Data Science Framework (EDSF)

Data Management and Governance (DMG) and

Research Data Management (RDM)

• RDM curricula example

• DAMA Data Management Body of Knowledge

(DMBOK)

EDISON 2018 Slide DeckData Science Professional Education and

Training72

Page 73: EDISON Data Science Framework (EDSF)

Research Data Management Model Curriculum –

Part of the EDISON Data Literacy Training

A. Use cases for data management and stewardship

• Preserving the Scientific Record

B. Data Management elements (organisational and individual)

• Goals and motivation for managing your data

• Data formats

• Creating documentation and metadata, metadata for discovery

• Using data portals and metadata registries

• Tracking Data Usage

• Handling sensitive data

• Backing up your data

• Data Management Plan (DMP) - to be a part of hands on session

C. Responsible Data Use Section (Citation, Copyright, Data Restrictions)

D. Open Science and Open Data (Definition, Standards, Open Data use and reuse, open government data)

• Research data and open access

• Repository and self- archiving services

• ORCID identifier for data

• Stakeholders and roles: engineer, librarian, researcher

• Open Data services: ORCID.org, Altmetric Doughnut, Zenodo

E. Hands on:

a) Data Management Plan design

b) Metadata and tools

c) Selection of licenses for open data and contents (e.g. Creative Common and Open Database)

EDISON 2018 Slide DeckData Science Professional Education and

Training73

Collaboration with the Research Data Alliance (RDA) on developing model curriculum on Research Data Literacy: • Modular, Customisable,

Localised, Open Access • Supported by the network of

trainers via resource swap board

Page 74: EDISON Data Science Framework (EDSF)

KAG3-DSDM: Data Management group: data curation,

preservation and data infrastructure

DM-BoK version 2 “Guide for

performing data management”

– 11 Knowledge Areas

(1) Data Governance

(2) Data Architecture

(3) Data Modelling and Design

(4) Data Storage and Operations

(5) Data Security

(6) Data Integration and

Interoperability

(7) Documents and Content

(8) Reference and Master Data

(9) Data Warehousing and Business

Intelligence

(10) Metadata

(11) Data Quality

EDISON 2018 Slide DeckData Science Professional Education and

Training

Other Knowledge Areas motivated by

RDA, European Open Data initiatives,

European Open Data Cloud

(12) PID, metadata, data registries

(13) Data Management Plan

(14) Open Science, Open Data, Open

Access, ORCID

(15) Responsible data use

74

• Highlighted in red: Considered (Research) Data Management literacy (minimum required knowledge)

Page 75: EDISON Data Science Framework (EDSF)

DMBOK: Data Governance

and Stewardship

Scope of a Data Governance

Programme

• Strategy

• Policy

• Standards and quality

• Oversight

• Compliance

• Issue management

• Data management projects

• Data asset valuation

EDISON 2018 Slide DeckData Science Professional Education and

Training75

Page 76: EDISON Data Science Framework (EDSF)

DMBOK: Data Management Principles

• Data is an asset with unique properties

• The value of data can and should be expressed in economic terms

• Managing data means managing the quality of data

• It takes Metadata to manage data

• It takes planning to manage data

• Data management requirements must drive Information Technology decisions

• Data management is cross-functional; it requires a range of skills and expertise

• Data management requires an enterprise perspective

• Data management must account for a range of perspectives

• Data management is lifecycle management

• Different types of data have different lifecycle characteristics

• Managing data includes managing the risks associated with data

• Effective data management requires leadership commitment

EDISON 2018 Slide DeckData Science Professional Education and

Training76

Page 77: EDISON Data Science Framework (EDSF)

DMBOK: Data Governance Organisation Parts

• Separation or

governance

responsibilities

• Multi-layer

• CDO

• CIO

• Councils

EDISON 2018 Slide DeckData Science Professional Education and

Training77

Page 78: EDISON Data Science Framework (EDSF)

Data Stewardship

• Creating and managing core Metadata: Definition and management of

business terminology, valid data values, and other critical Metadata.

• Documenting rules and standards: Definition/documentation of business rules,

data standards, and data quality rules.

– High quality data are often formulated in terms of rules rooted in the business

processes that create or consume data.

– Stewards help surface these rules and ensure their consistent use.

• Managing data quality issues: Stewards are often involved with the

identification and resolution of data related issues or in facilitating the process of

resolution.

• Executing operational data governance activities: Stewards are responsible

for ensuring that, day-today and project-by-project, data governance policies and

initiatives are adhered to. They should influence decisions to ensure that data is

managed in ways that support the overall goals of the organization.

“Best Data Steward is not made but found” DMBOK1 (2009)

EDISON 2018 Slide DeckData Science Professional Education and

Training78

Page 79: EDISON Data Science Framework (EDSF)

Roadmap recommendations: Data Science

Education and Training for Europe

A. Policy recommendations for the EC and Member States

• R1. Critical skills management and training

• R2. Gender balance and multi-cultural environment

• R3. Common data literacy

• R4. Data driven technology and education divide.

• R5. European studies on demand and role of Data Science and Analytics skills

• R6. Include the above actions in the European Digital Single Market Scoreboard

B. Recommendations to universities and professional training organisations

• R7. EDSF adoption by universities

• R8. Addressing Data Science professional and workplace skills in university curricula

• R9. Multiple delivery form

• R10. Sharing experience, courses and instructors

• R11. Supporting technical infrastructure for Data Science and data related education and professional training

C. Recommendations for Research (e) Infrastructures

• R12. Critical skills management in Research (e) Infrastructures

D. Required support and contribution to the standardisation bodies

• R13. The following are essential measures to achieve EDSF support existing and new standards.

EDISON 2018 Slide DeckData Science Professional Education and

Training79

EDISON project Deliverable D4.4 Content

• EU strategy for European Digital Single Market and European

Open Science Cloud

• Overview of Recent Studies on Data Science and Analytics skills

demand

• EDISON Data Science Framework (EDSF) and Educational

Model

• EDISON Data Science Framework (EDSF) and Educational

Model

• Priority, Timeline and ongoing activities

Page 80: EDISON Data Science Framework (EDSF)

Ongoing activities and developmentshttps://github.com/EDISONcommunity/EDSF

EDISON 2018 Slide DeckData Science Professional Education and

TrainingSlide_80

• EDISON Data Science Framework Release 3 – Published December 2018

– Call for contribution – EDSF use cases and guidelines - Deadline 30 March 2019

– Call for sponsorship

• Industry digitalisation projects and data literacy skills training– MATES project funded by EU ERASMUS Programme - EU Maritime industry digital

transformation, data skills development and Data + Ocean literacy

– Skills for SME: Data Science, IoT, Cybersecurity: Prepare to Data Economy and

Industry 4.0

• DARE project by APEC (Asia Pacific Economic Cooperation)

– Continuing cooperation for Asia Pacific region

– Recommended Data Science and Analytics Competences published August 2017 -

https://www.apec.org/Press/Features/2017/0620_DSA

Page 81: EDISON Data Science Framework (EDSF)

EDISON Initiative Online Presence

• EDSF github project - https://github.com/EDISONcommunity/EDSF

– Component documents CF-DS, DS-BoK, MC-DS, DSPP

• EDISON Community work area and discussions -

https://github.com/EDISONcommunity/EDSF/wiki/EDSFhome

• Mailing list - [email protected]

• EDISON project website (still active) http://edison-project.eu/

EDISON 2018 Slide DeckData Science Professional Education and

Training81

Page 82: EDISON Data Science Framework (EDSF)

Links to EDISON Resources

• EDISON Data Science Framework Release 3 (EDSF)

https://github.com/EDISONcommunity/EDSF

Component EDSF documents

CF-DS – Data Science Competence Framework

https://github.com/EDISONcommunity/EDSF/blob/master/EDISON_CF-DS-release2-v08.pdf

DS-BoK – Data Science Body of Knowledge

https://github.com/EDISONcommunity/EDSF/blob/master/EDISON_DS-BoK-release2-v04.pdf

MC-DS – Data Science Model Curriculum

https://github.com/EDISONcommunity/EDSF/blob/master/EDISON_MC-DS-release2-v03.pdf

DSPP – Data Science Professional profiles

https://github.com/EDISONcommunity/EDSF/blob/master/EDISON_DSPP-release2-v05.pdf

EDISON 2018 Slide DeckData Science Professional Education and

Training82

Page 83: EDISON Data Science Framework (EDSF)

Other related links

• Amsterdam School of Data Science

– https://www.schoolofdatascience.amsterdam/

– https://www.schoolofdatascience.amsterdam/education/

• Research Data Alliance interest Group on Education and Training on Handling of

Research Data (IG-ETHRD)

– https://www.rd-alliance.org/groups/education-and-training-handling-research-data.html

EDISON 2018 Slide DeckData Science Professional Education and

Training83

This work is licensed under the Creative Commons Attribution 4.0 International License.

To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Page 84: EDISON Data Science Framework (EDSF)

Additional materials

EDISON 2018 Slide DeckData Science Professional Education and

Training84

Page 85: EDISON Data Science Framework (EDSF)

EDSF Recognition, Endorsement and Implementation

• DARE (Data Analytics Rising Employment) project by APEC (Asia Pacific Economic Cooperation)

– DARE project Advisory Council meeting 4-5 May 2017, Singapore

• PcW and BHEF Report “Investing in America’s data science and analytics talent” April 2017

– Quotes EDSF and Amsterdam School of Data Science

• Dutch Ministry of Education recommended EDSF as a basis for university curricula on Data Science

– Workshop “Be Prepared for Big Data in the Cloud: Dutch Initiatives for personalized medicine and health research & toward a national action programmefor data science training”, Amsterdam 28 June 2016

– Currently working with Dutch Gov on re-skilling IT/data workers for DSA competences

• European Champion Universities network

– 1st Conference (13-14 July, UK)

– 2nd Conference (14-15 March, Madrid, Spain)

– 3rd Conference 19-20 June 2017, Warsaw

EDISON 2018 Slide DeckData Science Professional Education and

Training85

Page 86: EDISON Data Science Framework (EDSF)

e-CFv3.0 Internal Structure: Refactoring for CF-DS

• s

EDISON 2018 Slide DeckData Science Professional Education and

Training86

• 4 Dimensions– Competence Areas

– Competences

– Proficiency levels

– Skills and Knowledge

• 5 Competence Areas defined by ICT Business Process stages– Plan

– Build

– Deploy

– Run

– Manage

-> Refactor to Scientific Research (or Scientific Data) Lifecycle

– See example of RI manager at IG-ETRD wiki and meeting

• Each competence has 5 proficiency levels – Ranging from technical to engineering to

management to strategist/expert level

• Knowledge and skills property are defined for/by each competence and proficiency level (not unique)

Page 87: EDISON Data Science Framework (EDSF)

CWA 16458 (2012): European ICT Professional

Profiles

EDISON 2018 Slide DeckData Science Professional Education and

Training87

• The CWA defines 23 main ICT profiles the most widely used by organisations

Page 88: EDISON Data Science Framework (EDSF)

CWA Professional Profiles and Organisational

Workflow

EDISON 2018 Slide DeckData Science Professional Education and

Training88