10
GUIDEBOOK SAS ANALYTICS AND OPEN SOURCE April 2014 Document O75 © 2014 Nucleus Research, Inc. Reproduction in whole or in part without written permission is prohibited. Nucleus Research is the leading provider of value-focused technology research and advice. NucleusResearch.com

SAS ANALYTICS AND OPEN SOURCE

Embed Size (px)

Citation preview

GUIDEBOOK

SAS ANALYTICS AND OPEN

SOURCE

April 2014 Document O75

© 2014 Nucleus Research, Inc. Reproduction in whole or in part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

NucleusResearch.com

Phone: +1 617.720.2000

Nucleus Research Inc.

100 State Street

Boston, MA 02109

THE BOTTOM LINE

Many organizations balance open source solutions with commercial software to meet the

requirements for statistical analysis both within their organizations and externally with

regulatory bodies. While open source analytic tools offer a robust online community, and

extensive array of algorithms, packaged analytics software companies, most notably SAS

offer the performance, scalability, governance, and support organizations require for

production and operational analytics. Nucleus found that in most of these organizations,

open source and SAS is quickly becoming a complementary partnership, where new hires’

expertise and the ability to be nimble in conducting research and analysis bring benefits

and new approaches that can become part of an enterprise analytics implementation.

THE SITUATION

In the early 1990s when Linux first appeared on the scene, many did not see the impact

that a free, open source operating system would have on the market. It was developed for

a very small niche market and had little awareness outside of that group. That perception

changed dramatically over the next 10 years, and its place in the technology and software

space is now clearly entrenched. Linux had made a significant disruption to the operating

system market. As more hardware and software vendors began to develop and release

their own versions, and integrate it into solutions, the market awareness and acceptance of

Linux quickly developed. Now, the Linux kernel is now as much a part of the operating

system market as any of the others.

The same dynamic is being felt in the analytics space, with the growth of acceptance of

open source programming languages such as R and Python. These open source

programming languages also started with small on-line communities and were seen to fit

a small niche market, but that is quickly changing. As vendors, such as Revolution

Analytics, start to develop and sell packages using open source, the move of open source

to commercial software will continue to progress. Nucleus has found that in many

organizations, the work and output from open source development is being used within

commercial software implementation. With the expertise of the analysts, as well as the

algorithms and analysis being developed, organizations are quickly realizing benefits and

seeing results in their enterprise analytics implementations. One such commercial

software vendor, SAS, has thousands of users around the world, and many of those

organizations are using open source to extend their SAS implementations.

Page 3 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

SAS is used by organizations to perform deep statistical analysis in numerous industries

including financial services, healthcare, manufacturing, hospitality, and others that need to

analyze large volumes of data. In its analysis of SAS, Nucleus found key benefits from the

solution include improved decision making, increased analyst productivity, improved

profitability, operational efficiency, and the ability to identify opportunities for growth.

ANALYTICAL CONSIDERATIONS FOR OPEN SOURCE AND SAS

Open Source SAS

Source of data External data

Non-production data environment

Unstructured and structured data

Corporate data warehouse

Transactional systems

External data

Production and non-production

data environment

Unstructured and structured

data

Volume of data Small data sets

Spreadsheets, small data files

Small and large high-volume

data sets, from ERP systems,

corporate systems of record

Sensitivity of data Non-corporate data

Low security, open access

Non-enterprise or enterprise

data

Access can be tightly controlled

High corporate sensitivity

Governance Not consistently available

Open access

Multiple algorithms available for

the same techniques

Not validated or reliable processes

Extensive capabilities

Validated, proven and reliable

processes

Technical Support Available via online communities Available directly from SAS

technical support, telephone

and online communities, list

servers, online self help

Data Management Not available from one source

Not core to analysis capabilities

Extensive capabilities

While the use of open source analytics is growing, organizations continue to use SAS for

strategic and operational analysis of corporate data in production environments. In these

organizations, SAS is the “system of record”, and the analytics and information derived are

a trusted part of the decision-making process. Meanwhile, open source analytical assets

are rising in popularity for specific types of analytics tasks for a number of reasons,

including the amount of open source training in the educational environment, SAS expert

resources reducing due to attrition, and the perceived lower cost of open source.

Page 4 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

Many see open source as a low investment option for standalone, non-production

research. Open source allows analysts to conduct analysis on data not yet part of the

enterprise or production data environment, and uncover new approaches that could be

incorporated into the production environment as appropriate.

Balancing the licensing differences between open source and traditional software can be

initially be misleading, but with further investigation, the merits can easily be understood.

Open source software is seen as a perceived low cost approach, providing the code to the

analyst to do with as necessary. Packaged software from a vendor takes the code

approach to the next level, where customers will access not just the code, but integrated

software capabilities, training, roadmaps, customer support and other legal and

operational benefits. Many SAS customers stated the ability to leverage integrated

capabilities such as data management, extraction, security, governance, as well as support

and training, met their corporate standards, IT requirements, and was the approach best

suited for operational and production analysis.

To better understand the evolving analytics landscape, and the dynamics between SAS and

open source analytics, Nucleus analyzed the experiences of several SAS customers to

understand the business needs associated with their analytics solutions, their experiences

with open source solutions, and the benefits they’ve gained from both technology

strategies. These customers ranged in size from 17 thousand employees and 3 billion

dollars in revenue, to 300 thousand+ employees and 109 billion dollars in revenue.

WHY COMMERCIAL SOFTWARE

Nucleus found there were several reasons companies choose to use commercial software

packages, such as SAS, for operational and production analysis instead of open source:

scalability, performance, governance, security, and user support.

SCALABILITY AND PERFORMANCE

The organizations Nucleus analyzed stated they required a solution that was able to

process, analyze, and manage large amounts of data. For all, SAS has a proven ability to

scale and handle the volumes of production data organizations process for their analyses.

Open source solutions did not yet have a demonstrable ability to scale and perform to the

same level that many of these organizations required for production analysis. Customers

said:

“We use very large data sets – big data. Anything we use must efficiently process those

data sets. In the future we’ll be working with big data appliances and large volumes of

unstructured data. We have no concerns regarding SAS being able to handle these

future requirements, and know we can pull in experts as required.”

“SAS has much better performance against large data sets.”

“SAS has better scalability than R.”

Page 5 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

CUSTOMER PROFILE:

NORTH AMERICAN TELECOMMUNICATIONS SERVICE PROVIDER

This US-based organization uses SAS and open source for analysis in several different

departments, and in most cases, both SAS and open source are used together. The usage

pattern is based upon the skill set of the personnel within the department, as well as the

business need and the resources available. Both are used because:

SAS has the proven scalability, reliability and performance that the departments that

analyze high volume, structured data require for their analyses.

In departments that leverage external and internal data sources for their analysis, a

variety of open source and SAS is used. Security requirements of the data, size of

data set, the research and analysis performed, and the skill set of the individuals

performing the work are all drivers in the distribution and usage of the tools.

Open source and SAS are currently used throughout the company. Open source is

generally used for data sets not controlled by the corporate data security rules and

regulations. Open source gives analysts the flexibility to do ad-hoc analysis outside of

standard IT policies, while SAS provides the predictability and security that corporate

governance requires.

“The proven capability, security and the continuity of SAS is its strength within our

organization. Open source definitely has its place, and will continue to work in tandem with

our SAS implementation.”

− Principal Analyst, Business Systems

DATA MANAGEMENT

The customers Nucleus analyzed stated that the ability to manipulate, manage, and

integrate many diverse data sources was important to their analytics and business

requirements. Open source solutions did not yet have the data manipulation and

management capabilities that many of these organizations required. Customers said:

“R is not designed to acquire, manage or manipulate data. Open source is all about

developing that analysis, where SAS is all about the data.”

“We use SAS for analytics, data extraction, and data management. Open source cannot

do this.”

“We can easily integrate new functions and manipulate large volumes of data with SAS.

We can’t do this with open source.”

“We use SAS for the combination of data manipulation and connectivity to multiple

data sources it supports. This data connectivity, married with the stored procedures,

provides us the ability to perform advanced analytics.”

Page 6 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

CUSTOMER PROFILE:

MULTINATIONAL FOOD PROCESSING COMPANY

This multinational company has been using SAS globally for many years. It continues to

use SAS Analytics for production and operational analysis because:

Its Information Systems / Information Technology (ISIT) team has stringent corporate

security, compliance, and administrative requirements for all software used at a global

and production level.

SAS has better scalability and performance for production analysis of the very large

volumes of data.

SAS is used for the data management and data manipulation, not currently available

in open source.

While open source software does not meet these requirements, it is used in many divisions

for non-production work in the investigation of new markets. It is also used in smaller

projects, as the data for this type of analysis is usually from outside sources, and not part

of the operational systems, and as result, not strictly controlled by the ISIT team.

“Every year we take a critical look at our implementation, and while for non-commercial

work, R provides user flexibility for us, only SAS meets our strict compliance and data

security requirements.”

− Demand Planning Specialist & Statistician

GOVERNANCE

Security and governance ranked very high for these organizations. The ability to control

and view who is accessing the data, who is running analysis, the validity and accuracy of

the algorithms, and who is executing them was very important from a corporate security

perspective. Open source was unable, at this point, to provide that level of information,

nor meet the stringent regulatory, legal, and security requirements of the external

regulatory bodies, and internal corporate management teams. Customers said:

“Open source is based on fragments. There is no control or governance on those

fragments, and they can be changed, altered or even taken away. With SAS, that never

happens. You can have the same confidence that something you wrote 10 years ago

will still run, just like the code you wrote last week.”

“While we do have R in house, only standalone work is done in R. We would be

concerned with respect to data security if using R on our production systems.”

“We are a multi-national corporation, and our Information Systems / Information

Technology (ISIT) team is quite strict – and has a very high demand for control. We

must align with strict rules regarding compliance, governance, access, security, and

administration. We can match these rules in SAS. We can’t do that with open source.”

For all of the organizations Nucleus analyzed, that fact that SAS, as a company, could be

held accountable, was an important factor behind the decision to maintain and use SAS

Page 7 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

for their analytics. Open source solutions met their internal needs for a tool that would be

used for research and testing analysis on non-production datasets. These large enterprise

organizations could not afford, from a legal, operational or regulatory perspective, to use

software for strategic and operational decision making that did not have a vendor behind

it to provide support, and maintenance or product roadmaps.

Many expressed concerns about the lack of a legal entity behind open source, and the

inability to have confidence in a partnership with a vendor. Customers said:

“SAS is more established, and there are already legal, business, and support processes in

place to rely on.”

“Security is a part of the risk for considering using open source in production. If

something were to happen – there really is no one that can be held accountable with

open source. If there is a breach of data – we would require a reliable company to work

with and hold responsible.”

“We can rely on SAS as a partner.”

TRAINING, DOCUMENTATION AND SUPPORT

Trusted and expert customer support was very important. Being able to confirm, validate,

and trust the expertise was key for 100% of the organizations. They all stated the ability to

work with a true customer support organization, and be able to contact and speak with an

expert who understood the models and algorithms, provided the confidence, trust,

security, and reliability these organizations required. Customers said:

“Open source doesn’t offer the training that SAS does. Training courses and user groups

are important. SAS is well established in the market, and it would be very hard to

replace the level of training.”

“If anything went wrong, and you couldn’t find the solution to code problems in R – you

had to go to the forums to ‘hopefully’ find the solution. Not so in SAS. You are able to

get reliable support and solutions for problems.”

The reliability and predictability of the SAS algorithms was of most importance to the

organizations surveyed for strategic, production and operational analysis. They trusted the

algorithms, knew the algorithms were proven, validated, and well documented; they could

trust the quality of the results, and most importantly, knew an expert support organization

could be contacted to provide any assistance. As a result, the organizations knew their

analysts would lose minimal productivity having to uncover or troubleshoot algorithms for

production use. Customers said:

“I don’t know who is writing the algorithms in R. SAS algorithms are proven and fully

documented.”

“R provides lots of choice, but in many cases too much choice. Code isn’t always well

thought out, and there isn’t continuity in the ‘streams’. SAS is predictable, reliable, and

proven.”

“With SAS, you have that core base product that everything is spun off. This provides a

level of extensibility, core knowledge, and scalability that open source can’t give.”

Page 8 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

CUSTOMER PROFILE:

GLOBAL HOSPITALITY COMPANY

This global organization uses SAS for production and operational analysis, as well as data

management. It continues to use SAS Analytics for production and operational analysis

because:

Data security is important, and open source does not meet our Information Systems /

Information Technology team’s requirements.

Open source is constrained with respect to the amount of data it can process. The

volume of data required for analysis requires the scalability and performance

capabilities found with SAS.

SAS solutions are used across the organization for data extraction and manipulation,

data management as well as analytics. This functionality is not available in open

source and would require a significant investment to replace with other tools and

solutions.

Open source is currently used for ad-hoc work only, and the resulting analysis

development is potentially leveraged in SAS. R gives analysts the ability to perform

independent work with algorithms and analysis they’ve developed in R before transferring

that work into SAS.

“Every year we re-evaluate, but stay with SAS because its superior customer support, data

manipulation, scalability, and performance for large data volumes that we require.”

− Director, Strategy & Analytics

WHY OPEN SOURCE

Many organizations are adopting open source analytical tools such as R and Python in

some situations because the perceived low cost and ease of adoption makes it a valuable

tool for analyzing data. The organizations surveyed for this report showed a similar trend.

This is particularly true if the organization is targeting transactional data already addressed

by the SAS footprint or data that is not necessarily meant for the enterprise data

warehouse. Open source was leveraged by many as an important tool for its ability to

perform ad-hoc research in a standalone environment. Users of open source cited

increases in computing power, the ability to rapidly deploy and analyze data, and low

initial cost of adoption as main reasons for open source adoption. Customers said:

“Open source offers an attractive initial pricing model. The on-line user communities

have highly skilled and knowledge people, and for the right sized company with the

right problems, open source is a good choice.”

“R is used by individuals in my company for their specialized projects. They can easily

install it, and conduct research on things that may or may not become part of our

production or operational systems.”

Page 9 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

All the customers did agree that open source offers many algorithms, and flexible

approaches that can be used in a variety of ways. In addition, the open source community

offers a very strong source of knowledge and assistance. Customers stated that in some

cases, open source was a good fit:

“Some things are easier to do in open source. You can be more creative because of the

diversity of algorithms.”

“R allows our analysts to experiment and try out new analysis on a smaller scale on

non-production data sets.”

“While we do have R in house, only standalone work only is done in R. We would be

concerned with respect to data security if using R on our production systems.”

“At this point, we use R for research work. Analysts can run tests, and research on their

own machines without impacting the production systems or having to worry about

security and governance issues.”

Nucleus has found that there is a place for both open source and SAS in many enterprise

environments where SAS has been successfully used, sometimes for decades, to analyze

data.

THE COST OF SWITCHING

Nucleus found many organizations that had already made a significant investment in

resources and skills within their SAS environment believed that while free did “appear”

cheaper, there would be significant switching costs associated with moving their current

analytics footprint to open source. Main areas companies cited where switching to open

source would create certain disruption, and they believed, unnecessary expense included

the costs to convert their SAS analysis to open source code; the personnel costs of such a

project, the lost employee productivity, and the lost business impact that time away from

current analytics efforts would produce. Customers said:

“It would cost us between $0.5-1M in salary alone to make the transition from SAS to

open source. We would need a couple of people, an additional $300-400K, and no one

would be working on creating new models, analysis or algorithms.”

“We do everything with SAS - analytics, data extraction, and data management. Across

the organization – we would have to make a significant investment in other tools to

move from SAS.”

“We can’t hire an army of people to build a new environment. We have many different

ways to make the SAS tools work, and it is fully integrated to other systems. The effort

required to move away from SAS is our biggest concern.”

“Our industry is very specific as to how analytics are done, and the algorithms that are

used. Rules and legislation are very well defined from a maintainability perspective as

there are a lot of standards and structure. The use of SAS is a requirement.”

Page 10 © 2014 Nucleus Research, Inc. Reproduction in whole or part without written permission is prohibited.

Nucleus Research is the leading provider of value-focused technology research and advice.

NucleusResearch.com

April 2014 Document O75

CONCLUSION

Many organizations balance open source solutions with SAS to meet the growing need for

statistical analysis both within their organizations and externally with regulatory bodies.

Open source analytic tools offer hundreds of ways to execute an analytic analysis, while

SAS offers the performance, scalability, security, and governance many organizations

require for production and operational analytics. Organizations choose SAS for the

customer support required for enterprise sized organizations, and its ability to provide

high caliber training and documentation. Nucleus found that in most of these

organizations, the use of open source and SAS is not a vice-versa situation, but one where

the two environments are able to augment each other, and drive greater benefit for the

business. In choosing the best analytics approach for a particular task, considering the

source, volume, and sensitivity of data, will help organizations ensure they maximize

returns from both analytics approaches while making the most of their existing SAS

investment.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 107100_S125006.0514