20
111/03/30 1 RRE (Revolution R RRE (Revolution R Enterprise) vs. R at PC Enterprise) vs. R at PC Cluster Cluster Edward Cheng 2.18.2014

RRE (Revolution R Enterprise) vs. R at PC Cluster

Embed Size (px)

DESCRIPTION

將公司標幟插入此投影片 選取〔插入〕功能表 〔 圖片〕指令 選取〔從檔案〕指令 選取你的標幟圖片檔案 按下〔確定〕 調整標幟圖示大小 於標幟圖示內任意一處按一下.出現在標幟圖示外的白色小方塊即為可調整邊框 運用此法來調整物件大小 如果你在使用調整邊框之前按住 Ctrl 鍵,將維持你想調整之物件比例. RRE (Revolution R Enterprise) vs. R at PC Cluster. Edward Cheng 2.18.2014. PC Cluster. Environment. - PowerPoint PPT Presentation

Citation preview

112/04/19 1

RRE (Revolution R RRE (Revolution R Enterprise) vs. R at PC Enterprise) vs. R at PC ClusterCluster

Edward Cheng2.18.2014

PC ClusterPC Cluster

112/04/19 2

EnvironmentEnvironment

• Node01~node36,stathpc: RHEL 5 + RRE 6.1 (R-2.14.2)

• Node51~node60, himemhpc: RHEL 6 + RRE 7.0 (R-3.0.2)

112/04/19 3

HistoryHistory

R 起源 1993, Professor, Ross Ihaka and

Robert Gentleman, University of Aukland, 紐西蘭

Reolution Analytics 公司 (www.revolutionanalytics.com) 2008 by Intel Capital 等創投投資 董事會成員有: Robert Gentleman 教

授 (R founder), Norman H. Nie 顧問 (前 SPSS CEO)

Revolution R Enterprise ( 企業版 R)

112/04/19 4

RR

• R is world’s most widely used statistics programming language.

• Free and open source software

112/04/19 5

R usageR usage

112/04/19 6

R package growthR package growth

112/04/19 7

Why Revolution RWhy Revolution R

112/04/19 8

112/04/19 9

PerformancePerformance

R-2.14.2 RRE 6.1 R-3.0.1 RRE 7.0

Matrix Multiply (10000*10000) 751 sec 35 sec 568 sec 20 sec

SVD (10000*10000) 5746 sec 374 sec 4549 sec 256 sec

Big Data is comingBig Data is coming

112/04/19 10

DefinitionDefinition

• “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…

112/04/19 11

BytesBytes

112/04/19 12

Big DataBig Data

• 2011 年全球數位資料的使用量約為 1.8 ZB ( 1 ZB = 2 的 70 次方位元組)。依據 IDC ( International Data Corporation )所做的研究報告預測,到 2020 年的總量將是現在的 44 倍,約為 35.2 ZB 。

112/04/19 13

Big DataBig Data

112/04/19 14

BIG DATA

海嘯來襲

20062006累計儲存了 850 TB的網頁資料

20092009每週約有二億二千萬張照片上傳,也就是需要25 TB的空間儲存

20112011每分鐘約有 48小時(48GB)的影片上傳( 每天約有 70TB)

eBayeBay

The world’s largest online marketplace•We have over 50 petabytes of data •We have over 400 million items for sale•We process more than 250 million user queries per day•We have over 112 million active users•We sold over US$75 billion in merchandize in 2012

112/04/19 15

Big ProblemsBig Problems

• Capacitydata too big to fit into memory

• Speedcomputation may be too slow to be useful

112/04/19 16

Distributed computingDistributed computing

112/04/19 17

RevoScaleRRevoScaleR

• RevoScaleR PackageRevoScaleR analysis functions such as rxCube, rxLinMod, rxCovCor, rxLogit, and rxGlm will provide significant speed improvements over any alternatives. These algorithms are all optimized for handling big data.

112/04/19 18

Multi-threaded Multi-threaded ProcessingProcessing

112/04/19 19

.xdf data format.xdf data format

• The XDF file format, a binary file format with an interface that optimizes row and column processing and analysis.

112/04/19 20