[SSA] 01.bigdata database technology (2014.02.05)
Preview:
DESCRIPTION
Citation preview
- 1. [SSA] Big Data Analytics Big Data Database Technology
hg.min@samsung.com 2014. 2. 5.
- 2. Contents I. II. III. 1
- 3. 2
- 4. 1956 IBM (RAMAC) 5MB 5 , 2011 2TB 70 CPU , 2010 50 N (PC, ,
, TV) , , , : 1)
http://en.wikipedia.org/wiki/Memory_storage_density#Effects_on_price
2) MGI(McKinsey Global Institute) 2011.06 Big data: the next
frontier for innovation, competition, and productivity 3
- 5. : 300(McKinsey, 2011) 40% (McKinsey, 2011) 10~15 92%, 34%
(Cisco, 2011) (twitter) 1 (active user) , 2 (Twitter, 2011) 11 2
5000 ( CEO , 2011) : NIA() - (2013) 4
- 6. ICT : NIA() - (2013) 5
- 7. (1/2) 1944: Fremont Rider, , Wesleyan University Librarian,
1949: Claud Shannon, , 1961: Derek Price , 15 , law of exponential
increase 1996: 1997: M.Cox, D.Ellsworth, ApplicationControlled
Demand Paging for Out-of-Core Visualization :
http://www.hcltech.com/blogs/enterprise-application-services/history-big-data,
http://biggdata.weebly.com/ 6
- 8. (2/2) 2001: Doug Laney(Meta Group), Volume, Velocity,
Variety , 3D Data Management: Controlling Data Volume, Velocity,
and Variety 2005: Tim OReilly, , What is Web 2.0 2008: Bret Swanson
& George Glider, 2015 1 ZB() , 2006 50 ., Estimating the
Exaflood 2011: Martin Hillbert & Priscila Lopez, 1986~2007 25%
. , 1986 99.2% , 2007 94% . :
http://www.hcltech.com/blogs/enterprise-application-services/history-big-data,
http://biggdata.weebly.com/ 7
- 9. 8
- 10. . , . - [ ] () datum . . , , . - [ ] . , . - [] :
http://www.terms.co.kr/data.htm,
http://www.diffen.com/difference/Data_vs_Information 9
- 11. . . - [ ] " (noise) (signal) by Claude Shannon " by Gregory
Bateson . . . - [ ] :
http://terms.naver.com/entry.nhn?docId=1526261&cid=3619&categoryId=3623
10
- 12. vs. Raw, unorganized facts No context Just numbers and text
Processed data Data with context Value added to data summarized
origanized analyzed Example: 51007 Example 5/10/07 The date of your
final exam. $51,007 The average starting salary of an account
manager. :
http://www.slideshare.net/EinsteinX2/data-vs-information,
http://www.diffen.com/difference/Data_vs_Information 11
- 13. , .[5] - [] (SERI, 2010) ( ) , ICT , () , , , , , , , 4
(big volume) (big value) : NIA() - (2013) 12
- 14. :
http://smartdatacollective.com/yellowfin/75616/why-big-data-and-business-intelligence-one-direction
13
- 15. : Gruter BigData (2011) 14
- 16. 2013 : Gartner -Hype Cycle for Emerging Technologies, 2013,
http://www.alibabaoglan.com/blog/gartner-hype-cycle-2014/ 15
- 17. Big Data Landscape (2012, Forbes) :
http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
16
- 18. Big Data Landscape (v 2.0) :
http://www.slideshare.net/mjft01/big-data-landscape-version-20
17
- 19. 18
- 20. . - [] . . - [] : http://www.terms.co.kr/database.htm
19
- 21. (1/2) Persistent Storage & (ACID) 20
- 22. (2/2) :
http://4840895.blogspot.kr/2009/04/history-of-dbms.html 21
- 23. Database Landscape : 451 Group,
http://gigaom.com/2012/12/20/confused-by-the-glut-of-new-databases-heres-a-map-for-you/
22
- 24. SQL on Hadoop (Impala, Hive, Tajo, Drill) Hadoop Analytics
Spark In-memory Analytics Shark(SQL on Spark), SAP HANA Streaming /
CEP (Esper, S4, Storm, HStreaming) Realtime Analytics (Streaming
Processing) Streaming SQL (StreamSQL etc) NoSQL (MongoDB, Hbase,
Cassandra) Online Transactions NewSQL (MySQL Cluster, Tokutek,
VoltDB, dbShards) 23
- 25. GFS 2003 Google File System: A Distributed Storage
MapReduce 2004 Simplified Data Processing on Large Clusters Sawzall
2005 Interpreting the Data: Parallel Analysis with Sawzall Chubby
2006 The Chubby Lock Service for Loosely-Coupled Distributed
Systems BigTable 2006 A Distributed Storage System for Structured
Data Paxos 2007 Paxos Made Live - An Engineering Perspective
Colossus 2009 GFS II Percolator 2010 Large-scale Incremental
Processing Using Distributed Transactions and Notifications Pregel
2010 A System for Large-Scale Graph Processing Dremel 2010
Interactive Analysis of Web-Scale Datasets Tenzing 2011 A SQL
Implementation On The MapReduce Framework Megastore 2011 Providing
Scalable, Highly Available Storage for Interactive Services Spanner
2012 Google's Globally-Distributed Database F1 2012 The
Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business :
Google researchs 24
- 26. -> BigTable 2006 Apache HBase NoSQL Megastore 2011 -
BigTable + transaction + schema Spanner 2012 - NewSQL Dremel Online
Transaction 2010 Cloudera Impala, Apache Drill SQL on Hadoop
Tenzing 2011 Apache Hive An SQL implementation on mapreduce
framework Analytics 25
- 27. 26
- 28. Hadoop Ecosystem : Platformday 2012 27
- 29. BigData Software Stack (Hadoop) 28
- 30. BDAS(Berkeley Data Analytics Stack) :
https://amplab.cs.berkeley.edu/software/ 29
- 31. :
http://www.benstopford.com/2012/06/30/thoughts-on-big-data-technologies-part-1/
30
- 32. . 31