Upload
abhinav-tandon
View
35
Download
1
Embed Size (px)
Citation preview
Are data scientists fungible across sectors? Or is specialization within a sector very
important? Crisil Young Though Leader 2014
Abhinav Tandon
10/26/2014
PGP 2013-2015
Indian Institute of Management, Kozhikode
1
Table of Contents
1. Executive Summary …………………………………………………………………… 2
2. Introduction: What is data science? …………………………………………………. 3
3. Who is a data scientist? ……………………………………………………………….. 3
4. The key question: Are data scientists fungible across domains? …………………… 5
5. Conclusion & Recommendation ………………………………………………………11
6. Bibliography ……………………………………………………………………………12
7. Appendix: Opinions from industry experts …………………………………………. 13
2
1. Executive Summary
Effective utilization of Big Data has become the next source of competitive advantage between
firms. The advancements in technology has led to an increase in the number of sources from
where companies can generate relevant data and has also resulted in the ability to store huge
piled of data at relatively cheaper costs. To convert this raw data into useful information requires
the right balance of technology, mathematics and business.
At the core of this combination lies the data scientist, who identifies the sources of data, collates
and prepares the data, identifies relevant technology to manipulate data and derives actionable
insights that can be put to use to further business objectives. The context in which the data
scientist works has led to the debate on whether data scientists should specialize in a particular
domain or should they be fungible across domains.
This paper aims at proposing a solution to this question by analyzing the interests of all
stakeholders in the analytics landscape by means of primary and secondary research. The paper
attempts to capture the often neglected point of view- that of the data scientists in an Indian
context by conducting a survey to identify gaps in their needs and what the industry expects out
of them.
The analysis tries to balance the conflicting requirements of the various stakeholders by
proposing organizational structures and policies that can cater to the needs of both clients
requiring depth in domain knowledge and data scientists looking to remain relevant through
cross-domain exposure. The paper acknowledges that analytics, though growing at
unprecedented rates, is still a nascent industry and requires proper harvesting of the right talent to
ensure positive outcomes.
Summary [276 words]
Report [2,223 words]
Appendix [837 words]
3
2. Introduction: What is data science?
Data scientist was declared the sexiest job of the 21st century by the Harvard Business Review.
While the trend is just beginning to catch on, Data Science is a label that is a lot older than the
label. With the exponential increase in the sources of data generation and the increasing
capability to store the petabytes of data being generated on a daily basis , the art of collecting,
cleaning and analyzing that information has had to become much more complex to keep pace
with the capabilities of technology resulting in the birth of data science as we know it today.
Simply put, data science is the art of obtaining actionable insights from data by the way of
creating data products that provide information to decision makers while masking the underlying
data and processes. Mu Sigma Business Solutions, a leader in the field of pure play analytics,
defines decision sciences as the ability to address a mix of business problems that organizations
face on a daily basis using an interdisciplinary approach of business, applied math, technology
along with an appreciation for behavioral sciences. The firm defines this evolution in the
following manner:
Yesterday
Business +Technology allowed simple automation of processes
Today
o Math + Business allow more substantiated arguments
o Math + Technology allows forecasting and anticipation
o Math + Business + Technology allows better execution
Tomorrow
Math + Business + Technology + Behavioral Science would help develop
cognitive repairs against human biases
3. Who is a data scientist?
While definitions vary across papers and articles, in truth there is no agreed upon definition of
who a data scientist is. However a picture of a data scientist can be formed by putting together
these data related roles:
BI professional: MIS professionals who generate reports from structured data using
data querying tools
Data analyst: Adds value to the BI professional‟s job by performing „slice & dice‟
operations on the data to identify trends and segments. Also adds the element of
visualization to the report
4
Business analyst: Brings in domain knowledge and understanding of the business to
enable more complex operations such as forecasting
Big Data/Data Mining engineer: Performs a similar job as a data analyst however
operates on tools and techniques to operate on unstructured data
Statistician: Tests the obtained data for quality and correctness and runs appropriate
statistical tests
Project Manager: Aligns business with data backed findings. Possesses
communication and presentations skills necessary to convince leadership on course of
action
The pace of growth of Big Data has made it redundant to maintain the above mentioned
individual roles leading to their aggregation and creation of a new role of the „Data scientist‟.
The charts below identify the differentiating factors of a (Big) Data Scientist from other data
professionals. It also compares the time spend on various data related activities essential to a data
scientist.
(Source: EMC data science study)
Booze Allen Hamilton summarizes the role of a data scientist into the following basic functions:
Collect: What are the possible sources of data and how can I put them all together?
Describe: How do I develop an understanding of the content of my data?
Discover: What are the key relationships in my data?
Predict: What are the likely future outcomes?
5
Advise: What course of action should be taken?
4. The key question: Are data scientists fungible across domains?
A hotly debated topic, the opinions of this issue vary depending on the context of the opinion
maker. Hence the answer to this requires a 360° view taking into account the perspective of the
industry, the recruiters the consumers of big data analytics and the employees (considered in an
Indian context).
The recruiters
From the recruiter‟s point of view, the above definition of the data scientist points to the
following skills that a recruiter seeks from a data scientist based on a report by Booze Allen
Hamilton:
Hard skills:
o Knowledge of computer science: provides the environment to test data-driven
hypothesis and hence a working knowledge of data manipulation and processing
is necessary
o Mathematics & statistics: provide a theoretical framework to analyze data science
problems and algorithms
o Domain expertise: this contributes to an understanding of the kind of problem that
needs to be solved, the nature of data that exists in the domain and the manner in
which the problem space can be measures
Soft skills:
o Curiosity: to seek inter-relationships between data
o Creativity: to try new approaches to solve a problem
o Focus: to design and test new techniques
o Attention to detail: to maintain rigor and avoid over-reliance on intuition
It is immensely valuable to have knowledge of the domain in which the problem lies. Insights
without domain knowledge limit the effectiveness of the recommendations and can end up
misleading the consumer. Data scientists who are also domain experts can better align their
analysis with the industry trends and organizational goals knowing what trends to capture and
make sense of anomalies thrown up by patterns. Domain knowledge influences how a data
scientist imputes data, selects and algorithms and determines parameters for its success. It is not
possible for a single person to have expertise across a wide variety of domains. Domain
6
knowledge backed by good communication and influencing skills can help an organization in
making sound data backed decisions and move in the right direction.
The consumers
The end consumers of big data analytics can either be organizations that have formed an in-
house team to generate data driven insights or organizations that have outsourced this function to
an analytics consulting firm. In the first case, the consumers are synonymous with recruiters and
would lay strong emphasis on domain expertise while hiring for in-house teams. However,
organizations that choose to outsource are looking for more exciting and dynamic environments.
Such organizations are looking to transform and evolve their business models and learn from
across domains rather from the same industry. The table below provides a case in point on how
cross pollination can help industries find innovative solutions to their problems by looking
beyond the boundaries of their industries-
(Source: Mu-Sigma.com)
The Industry
A 2011, McKinsey study found that there was alarming shortage of analytics talent required to
help companies deal with Big Data. This study, later endorsed by a similar study one by EMC,
points that only about a third of the companies are able to use big data effectively for decision
making. The nature of business problems that the industry faces also evolves and goes through
various life stages:
Muddy problems are the problems that businesses encounter for the first time. There is
very little idea about its nature, scope and solutions
7
As industries start gaining better understanding of these problems, a picture begins to
form though the solution is still elusive. Such problems are not fuzzy.
Eventually after multiple iterations of solving the same problem, the logic and approach
becomes fine-tuned and the problem becomes a clear problem
Muddy and fuzzy problems require creativity and though process in approach. The solutions are
not straightforward and one can end up finding the solutions in the most unexpected places.
While we are witnessing a splurge in the data, the big data analytics industry is still in a nascent
stage. Hence most problems encountered are of a muddy/fuzzy nature requiring knowledge of
„where to look for the solution?‟ that „what is the solution?‟ In such a scenario, data scientists
with cross-domain exposure can be an invaluable asset to organizations.
The employees: an Indian context
An opinion that is often ignored is that of the employees or the data scientists in this case. To
understand their perspective, primary research was conducted on a sample of 67 data scientists.
Before looking at the results though, it is important to set the context regarding big data analytics
in India. The Big Data movement in India is still in its nascent stage and was primarily ushered
in by pure play analytics firms such as Mu Sigma, AbsolutData etc that became employers to
students fresh out of college providing them training in big data essentials across a host of
domains to which their clients belonged. Though these companies continue to grow
exponentially, some firms especially in the e-commerce and banking sector have begun creating
in-house analytics teams with the view of keeping sensitive data close to the company. The depth
and breadth of analysis remains limited and can get routine after a certain period.
Primary Research
Profile of Respondents
Some of the organizations represented in this survey are – Amazon, Stryker, Novartis, Amazon,
Mu Sigma, Facebook, Fractal Analytics, Accenture, Capgemini, Tredence, Rapid Progress,
Jurong Port etc.
They represent domains such as technology, media, telecom, banking & financial services,
hospitality and entertainment, retail, pharmaceuticals, human resources, sports & weather across
analytics functions such as marketing, supply chain, risk, social media and sales
8
While most respondents lie between 1-5 years of experience, it is interesting to note that they
have spent much lesser time in a particular domain or function with most indicating only 1-2
years in a particular domain or a function
61% of the respondents indicated that their primary reason for choosing analytics as a starting
career option was because it provides the right balance between technical know-how and
business knowledge. 46% stated that they enjoyed the autonomy that came with working in a
new unexplored field.
9
70% respondents indicated that they had no role in selecting their domain or function as a data
scientist.
39% respondents indicated that their domain knowledge was only sufficient enough to discuss
problems with clients.
Opinions of respondents:
85% respondents indicated that work tends to get routine after serving in a particular domain or
function for too long and 46% indicated that 2-3 years is the ideal time period to acquire
expertise in a particular domain/function.
10
Objectives of respondents:
While 92% respondents indicated considering analytics as a long term career option, 80% also
indicated that they wish to pursue higher education with 50% preferring to go for an MBA. 65%
indicated that they were interested in business consulting as an alternate career option over a
longer time period.
Conclusion from analysis:
The average age of data scientists is very young with some firms claiming an average age
of as low as 25 years
Due to the nascent stage of the industry, most data scientists prefer shifting domains and
functions to avoid boredom due as work becomes routine
Considering their young age, analytics professionals wish to pursue higher education and
look to analytics to gain requisite business knowledge across different domains which
they lack considering the technical nature of most of the data scientists
Survey indicates that personal preferences have no role to play in their choice of domain
or function which could result in a misalignment of interests and a possible wish to shift
domains and functions
11
5. Conclusion & Recommendation
Based on the above analysis, we understand that we‟re faced with a complex challenge of
balancing the conflicting requirements of the various stakeholders in the analytics space. The
need of organizations to specialize is countered by the need of data scientists to remain relevant
and fulfill their long term career objectives. Balancing these objectives requires a hybrid
approach that satisfactorily achieves both these objectives. This can be primarily achieved by
instituting relevant hiring and training policies and creating an appropriate organizational
structure. The solution proposed is based on the analysis, available literature and opinions from
industry experts (see Appendix)
Bill Coughran, a former SVP at Google, states that collaboration is important to data scientists as
it enables interaction, brainstorming and keeping each other challenged. This requires hiring
people with the right levels of intellectual curiosity, something that cannot be taught unlike
technical skills. Instead of looking for only strong technical proficiency, a firm should look to
hire people with experiences with different industries and a strong appetite for problem solving.
Creating a conducive environment would then require creating a high-performing, cross-
functional team with members including statisticians, graphic designers, programmers and
business decision makers to align data analytics with business goals. Data scientists need to be
given opportunities to discovery-driven work rather than problem-driven work to maximize their
potential and help organizations discover.
Data scientists in India, who in most cases are fresh graduates, should be allowed to work across
different domains and functions in their formative years as analytics professionals. The initial
domain should be assigned both on the basis of choice and relevance of their academic degrees
and rotation should be allowed for the first 6 years to ensure sufficient exposure to at least 3
different industries. Post the formative period, a process of specialization must ensue in which
the data scientist is made to pursue a particular domain with greater depth making it his or her
likely career path.
From an organizational structure point of view this calls for the creating a diffused data science
team following a federated model in which an analytics team in united under a Chief data
scientist but work with different operational areas for short and long term periods coordinating
under a centralized structure to ensure total business alignment. This model would ensure
ownership and autonomy, a factor that came out as important to data scientists in the survey. It
would also ensure cross-pollination of ideas, avoiding boredom by moving data scientists across
different teams while at the same time ensuring that organizational goals are being met.
12
6. Bibliography
O‟Reilly (2011), Big Data Now: Current perspectives from O‟Reilly Radar. O‟Reilly Media
D.J. Patil (2011), Building Data Science Teams: The Skills, Tools and Perspectives Behind Great
Data Science Groups. O‟Reilly Media
Stijn Viaene and Ku Leuven (2013), Data Scientists aren‟t domain experts. Vlerick Business
School
EMC data science study (2011)
Thomas H. Davenpot and D.J. Patil (2012), Data Scientist: The Sexiest Job of the 21st Century.
Harvard Business Review
Booze Allen Hamilton (2013), The field guide to data science. Booze Allen Hamilton
Web links:
http://www.quora.com/What-are-the-key-skills-of-a-data-scientist
http://www.mu-sigma.com/analytics/blog/
http://www.datasciencecentral.com/profiles/blogs/data-scientist-core-skills
13
7. Appendix: Opinions from industry experts
Profile of Respondents:
Respondent 1: Associate Director with Mu Sigma having 9 years of work experience in
analytics across different verticals
Respondent 2: Analytics strategist with IMS Health having 12 years of experience in Life
Sciences and Healthcare
Respondent 3: President of a data mining and predictive modeling firm in the US with over 25
years of experience
1. What is your opinion on domain and functional specialization? Do you feel data
scientists should be made to stick to a domain or should they be rotated between functions
and domains?
Respondent 1: I think they should be rotated between functions and Domains because that will
help cross-pollinate the ideas learnt from 1 domain being applied in another which otherwise
won't happen.
a. Ex: Weather and Retail domain. A guy performing weather analytics say for a weather
company, can bring in new ideas like incorporating weather related variables and understand
impact of that over Retail sales and that might help in better Stocking of goods. These ideas float
fast due to cross domain, and might get delayed otherwise
b. Said that a person should stay in a particular domain at least for 2-3 yrs before moving to
another one
Respondent 2: Rotation to a level, and then stick to one
Respondent 3: Though many think that what we do is new and revolutionary, I see it more as
evolutionary. This moment is a point on a geometric timeline that existed before now and will
extend into the future. What we‟re doing is not new though evolution has provided tools that
advance how we do it.As such, I compare it to being a doctor. In past days, a doctor was more of
a generalist but as the science has evolved, today you have medical specialists in addition to
generalists. We‟re seeing the same thing happen in our world of analytics. I‟m not sure “made to
stick” is a good way to look at it. “Prefer to stick” would be a better characterization. If an
analyst has a passion to specialize in a domain, there would be nothing wrong with it. There also
is nothing wrong with being a generalist. Both are valid and could be wise choices for an
individual.
14
2. In the decision sciences industry, do clients prefer going to organizations where they can
leverage more cross industry exposure or do they prefer organizations that can offer them
domain expertise?
Respondent 1: I think it‟s in recent past Clients want to leverage vendors who provide cross
industry exposure and they realise that they can provide the Domain knowledge if it lacks in
Vendor (but vendors do have people with Domain expertise as well)
Respondent 2: cross.... as bigger firms can give economies of scale, but if they want a low cost
adhoc activity then domain can come in
Respondent 3: Both. As a services provider, our clients like to see domain expertise. However,
they‟re even more impressed and attracted if your experience is broader. It's better to have it and
not need it than it is to need it and not have it.
3. Are there possible growth options and career trajectories in the analytics/decision
sciences industries? Is the industry well developed at the upper echelons of companies for
people to consider it as a long term option?
Respondent 1: Yes, I think so. Clients have started to realize the value of analytics and they
have started to build that capability in-house or at least they have dedicated leadership people
responsible for setting up the Analytics Centre of Excellence(CoE) and partner with Analytics
vendors to extract the value out of their data.
Respondent 2: yes the future is analytics and decision making. tech companies will also move
towards that way
Respondent 3: It really, really depends on the industry and then the particular business within
the industry. For example, in general, Financial services have been doing what we do for a long
time … big data, modeling … all of it. So they‟re advanced. However, within financial services,
you have insurance companies. They‟re good at actuarial things but have tended to lag the rest of
the industry when it comes to data or decision science. There are a few large insurance
companies that are more advanced but the many smaller insurers tend to be behind the curve.
4. How do you see the analytics growth story panning out in the future? Is it a mere fad or
is it here to stay?
15
Respondent 1: It is here to stay, as time and again Clients have proven the value of analyzing
their data and taking decisions based on what analytics provides from their data. Ex: Target,
Walmart, P&G, AT&T etc. who are all Champs in this space
Respondent 2: here to stay. even FIFA winner and IPL winner KKR used analytics
Respondent 3: Not even close to being a fad. On the contrary, I believe it will continue to grow
geometrically and will continue to develop specialties. It will become pervasive and ubiquitous.