Upload
microsoft
View
73
Download
1
Embed Size (px)
DESCRIPTION
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
Citation preview
VORES NÆSTE SPEAKER
Microsoft Next
Kåre Buch PetersenChr. Hansen
Innovation med Big Data – Chr. Hansens erfaringer
Big Data – det nye sort…og sætter data på (hele) virksomhedens agenda
Big Data - Why?
REALTIMEENTERPRISE
EXPLODINGDATA
VOLUMES
COMPETITIONMULTIPLEDEVICES
BUSINESSCOMPLEXITY
NEW DATASOURCES
FAST CHANGING
WORLD
(Big) Data Sources
Data
volu
mes
ERP
Webshop
Web Logs
Emails
Click Streams
Likes
Sensors
Tweets
Transactions Interactions Observations
Data variety and complexity
The four V’s of Big Data
6
Volume Velocity
Variety Variability
Data explosion. Multi-layered architecture Non linear scalability.
Data changes rapidly. Events in new pace. Decision window.
Many data formats. Complex integration. Non structured sources.
Variable interpretations. Enriching existing views. Virtual models.
“BIG”DATA
Information Use Cases
Advanced Analytics
Big Data Technologies
Tran
sacti
ons Interactions
Observations
Decision engines
Complex Event Processing
Visualization
Data Mining
Information Retrieval
Create transparency
Enable experimentation
Customize actions
Automate decisions
Innovate new business model
MPP/Appliances
Streaming
Unstructured
In-MemoryMap/Reduce
Our views on Big Data
--
BIG DATA in Chr. HansenThe elephant ride
9
Take away from this session
How Chr. Hansen transform data into business.Henry Ford; “If I had asked people what they wanted, they would have said faster horses.”
Big data is not hard, so try it out!Big data is like teenager sex: “everyone talks about it, nobody really knows how to do it, everyone thinks everyone is doing it…” source: beyondanalysis
HDInsight learningsTake out complexity and high initial cost using a Hybrid cloud setup
Select Picture placeholder and insert picture from ImageShopper
Chr. Hansen in a few words
Founded in 1874 in Copenhagen by Danish pharmacist Christian D.A. Hansen
We mainly produce cultures and dairy enzymes, probiotics and natural colors
A global supplier of bioscience based ingredients to the food, health, pharmaceutical and agricultural industries
Our leading market positions stem from innovative products and production processes, long-term customer relationships and intellectual property
Scientific data is a high valuable asset, ensuring innovation and future Business
11
WHAT – WHY – WHO - WHENWHAT:
A BIG DATA solution which extract data from our Electronic Laboratory System to be used in different reporting and visualization tools (MatLAB, SIMCA, MS Excel)
WHY:
More automated equipment in Chr. Hansen —including robots, advanced detectors, and other devices—produced a growing volume of complex data
Lacked an efficient way to capture, process, and make data available for use in diverse contexts. Moreover, manually collecting and analyzing the data in spreadsheets is labor-intensive and time-consuming
WHO:
Innovation (R&D) together with Global IT and external vendors (MS and Platon)
12
A world of unstructured dataImage your IT landscape:
Without a BI system - no cubes
Where your ERP data only exists in documents or sheets – no relational tables
Where the documents are not based upon a template or other standards – no data structure
Where your generate new types of data on a frequent basis – many data sources
Where some documents are uploaded to a SharePoint document list and others are stored on local file systems – lack of overview.
And just to add some more complexity the data should be processes with different algorithms before being presented to the end user.
This is the daily life of a scientist and properly also other user groups.
Now imaging your IT department should build a reporting system with above assumption. What to do?
13
The solution - dataflowFrom manually collecting and preparing data to...
14
Challenge 1 – Say yes“We need a system that can extract any scientific data and present data as the scientist request. Can you help us?”
15
Challenge two – unknown territory BIG DATA is more than BIG VOLUME
Take out complexity – Think BIG build simple
BIG DATA isn’t a magic wand which can solve all your traditional data issues
16
People generate complexity and context dependent data.
We cannot control the world, but we can advise how we can come in control (where it’s needed)
Unstructured data – what to do with it?
We developed a simple model to atomize and transpose data into one known data model
Challenge three – Unstructured data
17
The solution – data layers and technologies
18
Outcome one: More collaborative organization with a common and broader mindset
19
Outcome two: Changing the world as we know it Short term outcome:
Automatization and optimization of data processing – ”free the scientist”.
Making data accessible for use in a broad context - ”set data free”
Long term capabilities:
A new way to organize, transform and visualize data and information – ”from islands of data to integration of knowledge”
Realization of the full value potential in data – ”transforming data to business”.
Present status:
Still in pilot phase but the respond and feedback from the involved scientists have been extremely positive and an eye opener how IT can facilitate Innovation!
20
HDInsigth from Line of Business view
“If we had to purchase servers, storage devices, and software, and install it all in-house, it would have been a very different and a much more long-term project… It was simply so much faster to do this in the cloud with Windows Azure. We were able to implement the solution and start working with data in less than a week.”
21
HDInsights learningsUsed HDInsight to minimize complexity related to infrastructure and ensure low establish cost.
Worked perfect in a prototyping setup: in less than half an hour we had a running HADOOP distribution and it has been running ever since with no unannounced downtime.
Still need to define a infrastructure architecture fitted to your organizational needs and of internal resources to open ports, ensure bandwidth etc.
Not all HADOOP tools are supported on HDInsight – however those we have used so far is (HIVE, PIG).
Low entrance price and should we decide to bring it internal, switching cost isn’t assumed to be high. Easier to get funding when you can exemplify and prove the value of the technology.
Some issues with opening the ports and lack of control when it come to updates.
22
Get on the elephantDon’t be afraid of new technology, we will evolve and come out stronger than before.
BIG DATA projects is to important not to have IT involved.