29

Are Data Scientists fungible across sectors Rank 2 Abhinav

Embed Size (px)

Citation preview

Page 1: Are Data Scientists fungible across sectors Rank 2 Abhinav

Are data scientists fungible across sectors? Or is specialization within a sector very

important? Crisil Young Though Leader 2014

Abhinav Tandon

10/26/2014

PGP 2013-2015

Indian Institute of Management, Kozhikode

Page 2: Are Data Scientists fungible across sectors Rank 2 Abhinav

1

Table of Contents

1. Executive Summary …………………………………………………………………… 2

2. Introduction: What is data science? …………………………………………………. 3

3. Who is a data scientist? ……………………………………………………………….. 3

4. The key question: Are data scientists fungible across domains? …………………… 5

5. Conclusion & Recommendation ………………………………………………………11

6. Bibliography ……………………………………………………………………………12

7. Appendix: Opinions from industry experts …………………………………………. 13

Page 3: Are Data Scientists fungible across sectors Rank 2 Abhinav

2

1. Executive Summary

Effective utilization of Big Data has become the next source of competitive advantage between

firms. The advancements in technology has led to an increase in the number of sources from

where companies can generate relevant data and has also resulted in the ability to store huge

piled of data at relatively cheaper costs. To convert this raw data into useful information requires

the right balance of technology, mathematics and business.

At the core of this combination lies the data scientist, who identifies the sources of data, collates

and prepares the data, identifies relevant technology to manipulate data and derives actionable

insights that can be put to use to further business objectives. The context in which the data

scientist works has led to the debate on whether data scientists should specialize in a particular

domain or should they be fungible across domains.

This paper aims at proposing a solution to this question by analyzing the interests of all

stakeholders in the analytics landscape by means of primary and secondary research. The paper

attempts to capture the often neglected point of view- that of the data scientists in an Indian

context by conducting a survey to identify gaps in their needs and what the industry expects out

of them.

The analysis tries to balance the conflicting requirements of the various stakeholders by

proposing organizational structures and policies that can cater to the needs of both clients

requiring depth in domain knowledge and data scientists looking to remain relevant through

cross-domain exposure. The paper acknowledges that analytics, though growing at

unprecedented rates, is still a nascent industry and requires proper harvesting of the right talent to

ensure positive outcomes.

Summary [276 words]

Report [2,223 words]

Appendix [837 words]

Page 4: Are Data Scientists fungible across sectors Rank 2 Abhinav

3

2. Introduction: What is data science?

Data scientist was declared the sexiest job of the 21st century by the Harvard Business Review.

While the trend is just beginning to catch on, Data Science is a label that is a lot older than the

label. With the exponential increase in the sources of data generation and the increasing

capability to store the petabytes of data being generated on a daily basis , the art of collecting,

cleaning and analyzing that information has had to become much more complex to keep pace

with the capabilities of technology resulting in the birth of data science as we know it today.

Simply put, data science is the art of obtaining actionable insights from data by the way of

creating data products that provide information to decision makers while masking the underlying

data and processes. Mu Sigma Business Solutions, a leader in the field of pure play analytics,

defines decision sciences as the ability to address a mix of business problems that organizations

face on a daily basis using an interdisciplinary approach of business, applied math, technology

along with an appreciation for behavioral sciences. The firm defines this evolution in the

following manner:

Yesterday

Business +Technology allowed simple automation of processes

Today

o Math + Business allow more substantiated arguments

o Math + Technology allows forecasting and anticipation

o Math + Business + Technology allows better execution

Tomorrow

Math + Business + Technology + Behavioral Science would help develop

cognitive repairs against human biases

3. Who is a data scientist?

While definitions vary across papers and articles, in truth there is no agreed upon definition of

who a data scientist is. However a picture of a data scientist can be formed by putting together

these data related roles:

BI professional: MIS professionals who generate reports from structured data using

data querying tools

Data analyst: Adds value to the BI professional‟s job by performing „slice & dice‟

operations on the data to identify trends and segments. Also adds the element of

visualization to the report

Page 5: Are Data Scientists fungible across sectors Rank 2 Abhinav

4

Business analyst: Brings in domain knowledge and understanding of the business to

enable more complex operations such as forecasting

Big Data/Data Mining engineer: Performs a similar job as a data analyst however

operates on tools and techniques to operate on unstructured data

Statistician: Tests the obtained data for quality and correctness and runs appropriate

statistical tests

Project Manager: Aligns business with data backed findings. Possesses

communication and presentations skills necessary to convince leadership on course of

action

The pace of growth of Big Data has made it redundant to maintain the above mentioned

individual roles leading to their aggregation and creation of a new role of the „Data scientist‟.

The charts below identify the differentiating factors of a (Big) Data Scientist from other data

professionals. It also compares the time spend on various data related activities essential to a data

scientist.

(Source: EMC data science study)

Booze Allen Hamilton summarizes the role of a data scientist into the following basic functions:

Collect: What are the possible sources of data and how can I put them all together?

Describe: How do I develop an understanding of the content of my data?

Discover: What are the key relationships in my data?

Predict: What are the likely future outcomes?

Page 6: Are Data Scientists fungible across sectors Rank 2 Abhinav

5

Advise: What course of action should be taken?

4. The key question: Are data scientists fungible across domains?

A hotly debated topic, the opinions of this issue vary depending on the context of the opinion

maker. Hence the answer to this requires a 360° view taking into account the perspective of the

industry, the recruiters the consumers of big data analytics and the employees (considered in an

Indian context).

The recruiters

From the recruiter‟s point of view, the above definition of the data scientist points to the

following skills that a recruiter seeks from a data scientist based on a report by Booze Allen

Hamilton:

Hard skills:

o Knowledge of computer science: provides the environment to test data-driven

hypothesis and hence a working knowledge of data manipulation and processing

is necessary

o Mathematics & statistics: provide a theoretical framework to analyze data science

problems and algorithms

o Domain expertise: this contributes to an understanding of the kind of problem that

needs to be solved, the nature of data that exists in the domain and the manner in

which the problem space can be measures

Soft skills:

o Curiosity: to seek inter-relationships between data

o Creativity: to try new approaches to solve a problem

o Focus: to design and test new techniques

o Attention to detail: to maintain rigor and avoid over-reliance on intuition

It is immensely valuable to have knowledge of the domain in which the problem lies. Insights

without domain knowledge limit the effectiveness of the recommendations and can end up

misleading the consumer. Data scientists who are also domain experts can better align their

analysis with the industry trends and organizational goals knowing what trends to capture and

make sense of anomalies thrown up by patterns. Domain knowledge influences how a data

scientist imputes data, selects and algorithms and determines parameters for its success. It is not

possible for a single person to have expertise across a wide variety of domains. Domain

Page 7: Are Data Scientists fungible across sectors Rank 2 Abhinav

6

knowledge backed by good communication and influencing skills can help an organization in

making sound data backed decisions and move in the right direction.

The consumers

The end consumers of big data analytics can either be organizations that have formed an in-

house team to generate data driven insights or organizations that have outsourced this function to

an analytics consulting firm. In the first case, the consumers are synonymous with recruiters and

would lay strong emphasis on domain expertise while hiring for in-house teams. However,

organizations that choose to outsource are looking for more exciting and dynamic environments.

Such organizations are looking to transform and evolve their business models and learn from

across domains rather from the same industry. The table below provides a case in point on how

cross pollination can help industries find innovative solutions to their problems by looking

beyond the boundaries of their industries-

(Source: Mu-Sigma.com)

The Industry

A 2011, McKinsey study found that there was alarming shortage of analytics talent required to

help companies deal with Big Data. This study, later endorsed by a similar study one by EMC,

points that only about a third of the companies are able to use big data effectively for decision

making. The nature of business problems that the industry faces also evolves and goes through

various life stages:

Muddy problems are the problems that businesses encounter for the first time. There is

very little idea about its nature, scope and solutions

Page 8: Are Data Scientists fungible across sectors Rank 2 Abhinav

7

As industries start gaining better understanding of these problems, a picture begins to

form though the solution is still elusive. Such problems are not fuzzy.

Eventually after multiple iterations of solving the same problem, the logic and approach

becomes fine-tuned and the problem becomes a clear problem

Muddy and fuzzy problems require creativity and though process in approach. The solutions are

not straightforward and one can end up finding the solutions in the most unexpected places.

While we are witnessing a splurge in the data, the big data analytics industry is still in a nascent

stage. Hence most problems encountered are of a muddy/fuzzy nature requiring knowledge of

„where to look for the solution?‟ that „what is the solution?‟ In such a scenario, data scientists

with cross-domain exposure can be an invaluable asset to organizations.

The employees: an Indian context

An opinion that is often ignored is that of the employees or the data scientists in this case. To

understand their perspective, primary research was conducted on a sample of 67 data scientists.

Before looking at the results though, it is important to set the context regarding big data analytics

in India. The Big Data movement in India is still in its nascent stage and was primarily ushered

in by pure play analytics firms such as Mu Sigma, AbsolutData etc that became employers to

students fresh out of college providing them training in big data essentials across a host of

domains to which their clients belonged. Though these companies continue to grow

exponentially, some firms especially in the e-commerce and banking sector have begun creating

in-house analytics teams with the view of keeping sensitive data close to the company. The depth

and breadth of analysis remains limited and can get routine after a certain period.

Primary Research

Profile of Respondents

Some of the organizations represented in this survey are – Amazon, Stryker, Novartis, Amazon,

Mu Sigma, Facebook, Fractal Analytics, Accenture, Capgemini, Tredence, Rapid Progress,

Jurong Port etc.

They represent domains such as technology, media, telecom, banking & financial services,

hospitality and entertainment, retail, pharmaceuticals, human resources, sports & weather across

analytics functions such as marketing, supply chain, risk, social media and sales

Page 9: Are Data Scientists fungible across sectors Rank 2 Abhinav

8

While most respondents lie between 1-5 years of experience, it is interesting to note that they

have spent much lesser time in a particular domain or function with most indicating only 1-2

years in a particular domain or a function

61% of the respondents indicated that their primary reason for choosing analytics as a starting

career option was because it provides the right balance between technical know-how and

business knowledge. 46% stated that they enjoyed the autonomy that came with working in a

new unexplored field.

Page 10: Are Data Scientists fungible across sectors Rank 2 Abhinav

9

70% respondents indicated that they had no role in selecting their domain or function as a data

scientist.

39% respondents indicated that their domain knowledge was only sufficient enough to discuss

problems with clients.

Opinions of respondents:

85% respondents indicated that work tends to get routine after serving in a particular domain or

function for too long and 46% indicated that 2-3 years is the ideal time period to acquire

expertise in a particular domain/function.

Page 11: Are Data Scientists fungible across sectors Rank 2 Abhinav

10

Objectives of respondents:

While 92% respondents indicated considering analytics as a long term career option, 80% also

indicated that they wish to pursue higher education with 50% preferring to go for an MBA. 65%

indicated that they were interested in business consulting as an alternate career option over a

longer time period.

Conclusion from analysis:

The average age of data scientists is very young with some firms claiming an average age

of as low as 25 years

Due to the nascent stage of the industry, most data scientists prefer shifting domains and

functions to avoid boredom due as work becomes routine

Considering their young age, analytics professionals wish to pursue higher education and

look to analytics to gain requisite business knowledge across different domains which

they lack considering the technical nature of most of the data scientists

Survey indicates that personal preferences have no role to play in their choice of domain

or function which could result in a misalignment of interests and a possible wish to shift

domains and functions

Page 12: Are Data Scientists fungible across sectors Rank 2 Abhinav

11

5. Conclusion & Recommendation

Based on the above analysis, we understand that we‟re faced with a complex challenge of

balancing the conflicting requirements of the various stakeholders in the analytics space. The

need of organizations to specialize is countered by the need of data scientists to remain relevant

and fulfill their long term career objectives. Balancing these objectives requires a hybrid

approach that satisfactorily achieves both these objectives. This can be primarily achieved by

instituting relevant hiring and training policies and creating an appropriate organizational

structure. The solution proposed is based on the analysis, available literature and opinions from

industry experts (see Appendix)

Bill Coughran, a former SVP at Google, states that collaboration is important to data scientists as

it enables interaction, brainstorming and keeping each other challenged. This requires hiring

people with the right levels of intellectual curiosity, something that cannot be taught unlike

technical skills. Instead of looking for only strong technical proficiency, a firm should look to

hire people with experiences with different industries and a strong appetite for problem solving.

Creating a conducive environment would then require creating a high-performing, cross-

functional team with members including statisticians, graphic designers, programmers and

business decision makers to align data analytics with business goals. Data scientists need to be

given opportunities to discovery-driven work rather than problem-driven work to maximize their

potential and help organizations discover.

Data scientists in India, who in most cases are fresh graduates, should be allowed to work across

different domains and functions in their formative years as analytics professionals. The initial

domain should be assigned both on the basis of choice and relevance of their academic degrees

and rotation should be allowed for the first 6 years to ensure sufficient exposure to at least 3

different industries. Post the formative period, a process of specialization must ensue in which

the data scientist is made to pursue a particular domain with greater depth making it his or her

likely career path.

From an organizational structure point of view this calls for the creating a diffused data science

team following a federated model in which an analytics team in united under a Chief data

scientist but work with different operational areas for short and long term periods coordinating

under a centralized structure to ensure total business alignment. This model would ensure

ownership and autonomy, a factor that came out as important to data scientists in the survey. It

would also ensure cross-pollination of ideas, avoiding boredom by moving data scientists across

different teams while at the same time ensuring that organizational goals are being met.

Page 13: Are Data Scientists fungible across sectors Rank 2 Abhinav

12

6. Bibliography

O‟Reilly (2011), Big Data Now: Current perspectives from O‟Reilly Radar. O‟Reilly Media

D.J. Patil (2011), Building Data Science Teams: The Skills, Tools and Perspectives Behind Great

Data Science Groups. O‟Reilly Media

Stijn Viaene and Ku Leuven (2013), Data Scientists aren‟t domain experts. Vlerick Business

School

EMC data science study (2011)

Thomas H. Davenpot and D.J. Patil (2012), Data Scientist: The Sexiest Job of the 21st Century.

Harvard Business Review

Booze Allen Hamilton (2013), The field guide to data science. Booze Allen Hamilton

Web links:

http://www.quora.com/What-are-the-key-skills-of-a-data-scientist

http://www.mu-sigma.com/analytics/blog/

http://www.datasciencecentral.com/profiles/blogs/data-scientist-core-skills

Page 14: Are Data Scientists fungible across sectors Rank 2 Abhinav

13

7. Appendix: Opinions from industry experts

Profile of Respondents:

Respondent 1: Associate Director with Mu Sigma having 9 years of work experience in

analytics across different verticals

Respondent 2: Analytics strategist with IMS Health having 12 years of experience in Life

Sciences and Healthcare

Respondent 3: President of a data mining and predictive modeling firm in the US with over 25

years of experience

1. What is your opinion on domain and functional specialization? Do you feel data

scientists should be made to stick to a domain or should they be rotated between functions

and domains?

Respondent 1: I think they should be rotated between functions and Domains because that will

help cross-pollinate the ideas learnt from 1 domain being applied in another which otherwise

won't happen.

a. Ex: Weather and Retail domain. A guy performing weather analytics say for a weather

company, can bring in new ideas like incorporating weather related variables and understand

impact of that over Retail sales and that might help in better Stocking of goods. These ideas float

fast due to cross domain, and might get delayed otherwise

b. Said that a person should stay in a particular domain at least for 2-3 yrs before moving to

another one

Respondent 2: Rotation to a level, and then stick to one

Respondent 3: Though many think that what we do is new and revolutionary, I see it more as

evolutionary. This moment is a point on a geometric timeline that existed before now and will

extend into the future. What we‟re doing is not new though evolution has provided tools that

advance how we do it.As such, I compare it to being a doctor. In past days, a doctor was more of

a generalist but as the science has evolved, today you have medical specialists in addition to

generalists. We‟re seeing the same thing happen in our world of analytics. I‟m not sure “made to

stick” is a good way to look at it. “Prefer to stick” would be a better characterization. If an

analyst has a passion to specialize in a domain, there would be nothing wrong with it. There also

is nothing wrong with being a generalist. Both are valid and could be wise choices for an

individual.

Page 15: Are Data Scientists fungible across sectors Rank 2 Abhinav

14

2. In the decision sciences industry, do clients prefer going to organizations where they can

leverage more cross industry exposure or do they prefer organizations that can offer them

domain expertise?

Respondent 1: I think it‟s in recent past Clients want to leverage vendors who provide cross

industry exposure and they realise that they can provide the Domain knowledge if it lacks in

Vendor (but vendors do have people with Domain expertise as well)

Respondent 2: cross.... as bigger firms can give economies of scale, but if they want a low cost

adhoc activity then domain can come in

Respondent 3: Both. As a services provider, our clients like to see domain expertise. However,

they‟re even more impressed and attracted if your experience is broader. It's better to have it and

not need it than it is to need it and not have it.

3. Are there possible growth options and career trajectories in the analytics/decision

sciences industries? Is the industry well developed at the upper echelons of companies for

people to consider it as a long term option?

Respondent 1: Yes, I think so. Clients have started to realize the value of analytics and they

have started to build that capability in-house or at least they have dedicated leadership people

responsible for setting up the Analytics Centre of Excellence(CoE) and partner with Analytics

vendors to extract the value out of their data.

Respondent 2: yes the future is analytics and decision making. tech companies will also move

towards that way

Respondent 3: It really, really depends on the industry and then the particular business within

the industry. For example, in general, Financial services have been doing what we do for a long

time … big data, modeling … all of it. So they‟re advanced. However, within financial services,

you have insurance companies. They‟re good at actuarial things but have tended to lag the rest of

the industry when it comes to data or decision science. There are a few large insurance

companies that are more advanced but the many smaller insurers tend to be behind the curve.

4. How do you see the analytics growth story panning out in the future? Is it a mere fad or

is it here to stay?

Page 16: Are Data Scientists fungible across sectors Rank 2 Abhinav

15

Respondent 1: It is here to stay, as time and again Clients have proven the value of analyzing

their data and taking decisions based on what analytics provides from their data. Ex: Target,

Walmart, P&G, AT&T etc. who are all Champs in this space

Respondent 2: here to stay. even FIFA winner and IPL winner KKR used analytics

Respondent 3: Not even close to being a fad. On the contrary, I believe it will continue to grow

geometrically and will continue to develop specialties. It will become pervasive and ubiquitous.