32
Big Data Database Technology 민형기 [email protected] 2014. 2. 5. [SSA] Big Data Analytics

[SSA] 01.bigdata database technology (2014.02.05)

Embed Size (px)

DESCRIPTION

 

Citation preview

  • 1. [SSA] Big Data Analytics Big Data Database Technology [email protected] 2014. 2. 5.
  • 2. Contents I. II. III. 1
  • 3. 2
  • 4. 1956 IBM (RAMAC) 5MB 5 , 2011 2TB 70 CPU , 2010 50 N (PC, , , TV) , , , : 1) http://en.wikipedia.org/wiki/Memory_storage_density#Effects_on_price 2) MGI(McKinsey Global Institute) 2011.06 Big data: the next frontier for innovation, competition, and productivity 3
  • 5. : 300(McKinsey, 2011) 40% (McKinsey, 2011) 10~15 92%, 34% (Cisco, 2011) (twitter) 1 (active user) , 2 (Twitter, 2011) 11 2 5000 ( CEO , 2011) : NIA() - (2013) 4
  • 6. ICT : NIA() - (2013) 5
  • 7. (1/2) 1944: Fremont Rider, , Wesleyan University Librarian, 1949: Claud Shannon, , 1961: Derek Price , 15 , law of exponential increase 1996: 1997: M.Cox, D.Ellsworth, ApplicationControlled Demand Paging for Out-of-Core Visualization : http://www.hcltech.com/blogs/enterprise-application-services/history-big-data, http://biggdata.weebly.com/ 6
  • 8. (2/2) 2001: Doug Laney(Meta Group), Volume, Velocity, Variety , 3D Data Management: Controlling Data Volume, Velocity, and Variety 2005: Tim OReilly, , What is Web 2.0 2008: Bret Swanson & George Glider, 2015 1 ZB() , 2006 50 ., Estimating the Exaflood 2011: Martin Hillbert & Priscila Lopez, 1986~2007 25% . , 1986 99.2% , 2007 94% . : http://www.hcltech.com/blogs/enterprise-application-services/history-big-data, http://biggdata.weebly.com/ 7
  • 9. 8
  • 10. . , . - [ ] () datum . . , , . - [ ] . , . - [] : http://www.terms.co.kr/data.htm, http://www.diffen.com/difference/Data_vs_Information 9
  • 11. . . - [ ] " (noise) (signal) by Claude Shannon " by Gregory Bateson . . . - [ ] : http://terms.naver.com/entry.nhn?docId=1526261&cid=3619&categoryId=3623 10
  • 12. vs. Raw, unorganized facts No context Just numbers and text Processed data Data with context Value added to data summarized origanized analyzed Example: 51007 Example 5/10/07 The date of your final exam. $51,007 The average starting salary of an account manager. : http://www.slideshare.net/EinsteinX2/data-vs-information, http://www.diffen.com/difference/Data_vs_Information 11
  • 13. , .[5] - [] (SERI, 2010) ( ) , ICT , () , , , , , , , 4 (big volume) (big value) : NIA() - (2013) 12
  • 14. : http://smartdatacollective.com/yellowfin/75616/why-big-data-and-business-intelligence-one-direction 13
  • 15. : Gruter BigData (2011) 14
  • 16. 2013 : Gartner -Hype Cycle for Emerging Technologies, 2013, http://www.alibabaoglan.com/blog/gartner-hype-cycle-2014/ 15
  • 17. Big Data Landscape (2012, Forbes) : http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/ 16
  • 18. Big Data Landscape (v 2.0) : http://www.slideshare.net/mjft01/big-data-landscape-version-20 17
  • 19. 18
  • 20. . - [] . . - [] : http://www.terms.co.kr/database.htm 19
  • 21. (1/2) Persistent Storage & (ACID) 20
  • 22. (2/2) : http://4840895.blogspot.kr/2009/04/history-of-dbms.html 21
  • 23. Database Landscape : 451 Group, http://gigaom.com/2012/12/20/confused-by-the-glut-of-new-databases-heres-a-map-for-you/ 22
  • 24. SQL on Hadoop (Impala, Hive, Tajo, Drill) Hadoop Analytics Spark In-memory Analytics Shark(SQL on Spark), SAP HANA Streaming / CEP (Esper, S4, Storm, HStreaming) Realtime Analytics (Streaming Processing) Streaming SQL (StreamSQL etc) NoSQL (MongoDB, Hbase, Cassandra) Online Transactions NewSQL (MySQL Cluster, Tokutek, VoltDB, dbShards) 23
  • 25. GFS 2003 Google File System: A Distributed Storage MapReduce 2004 Simplified Data Processing on Large Clusters Sawzall 2005 Interpreting the Data: Parallel Analysis with Sawzall Chubby 2006 The Chubby Lock Service for Loosely-Coupled Distributed Systems BigTable 2006 A Distributed Storage System for Structured Data Paxos 2007 Paxos Made Live - An Engineering Perspective Colossus 2009 GFS II Percolator 2010 Large-scale Incremental Processing Using Distributed Transactions and Notifications Pregel 2010 A System for Large-Scale Graph Processing Dremel 2010 Interactive Analysis of Web-Scale Datasets Tenzing 2011 A SQL Implementation On The MapReduce Framework Megastore 2011 Providing Scalable, Highly Available Storage for Interactive Services Spanner 2012 Google's Globally-Distributed Database F1 2012 The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business : Google researchs 24
  • 26. -> BigTable 2006 Apache HBase NoSQL Megastore 2011 - BigTable + transaction + schema Spanner 2012 - NewSQL Dremel Online Transaction 2010 Cloudera Impala, Apache Drill SQL on Hadoop Tenzing 2011 Apache Hive An SQL implementation on mapreduce framework Analytics 25
  • 27. 26
  • 28. Hadoop Ecosystem : Platformday 2012 27
  • 29. BigData Software Stack (Hadoop) 28
  • 30. BDAS(Berkeley Data Analytics Stack) : https://amplab.cs.berkeley.edu/software/ 29
  • 31. : http://www.benstopford.com/2012/06/30/thoughts-on-big-data-technologies-part-1/ 30
  • 32. . 31